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ABSTRACT 


The purpose of this thesis was to design an information storage and 
retrieval system for multiple choice test items on the IBM 360/67 computer 
which would accommodate both measurement and document retrievals. 

The data management organization involved the setting up of a 
sequential file with manually coded indexes, and was specifically 
implemented for the Department of Internal Medicine, though specifica- 
tions are given to indicate the generality of the design to other educa- 
tional areas. Provision was made for reducing user error and maximizing 
useability of output. Documentation specifying hardware requirements, 
cost of implementation, human requirements, and source listing of all 
Fortran IV programs used, was also given. 

Modification procedures were provided for updating the information 


bank. 
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CHAPTER@D 


INTRODUCTION 


Statement and Importance of Problem 


Machines called computers were so named because the only significant 
work given to them was computation. To describe its potentialities, 
however, one might consider Flanagan's (1966) suggestion for a more 
appropriate name--information machine. McCarthy (1966) maintains this 
information machine is becoming the contempory counterpart of the steam 
engine that brought on the industrial revolution--the computer heralding 
the information revolution. Grunberger (1966) shows it has the versa- 
tility, logical flexibility, and ability to grow that is not matched by 
anything short of a living organism. 

Thus in the last 15 to 20 years an information science has developed 
as a result of the use of these machines. Cuada (1966) avoids writing 
a definition of information science, preferring to indicate the areas 
of study that can be considered as belonging to this field. Taylor 
(1966) also avoids a formal definition, but does show how information 
science can be viewed either from an operational or pedagogical point of 
view, since it ranges through a spectrum from services at one end (for 
example, libraries) through system design to basic research. 

The use of information storage and retrieval systems is a matter 
of everyday experience for literate people. The public library, corres- 
pondence files, accounting systems, directories, dictionaries, and so on 
are all information storage and retrieval systems. All are comprised 


of records to which one may address a variety of allowable questions 
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with a reasonable expectation of retrieving a selection of records in 
response to each question. 

Operationally all such systems employ only three basic processes 
--the analysis of records, the derivation of new records from old ones, 
and the physical displacement of records over a distance. These same 
processes are used in machine retrieval systems. The discussion of the 
development of the retrieval system for this study will be held within 
the frame of reference provided by these three basic processes. 

Specifically the study concerns itself with the design, implemen- 
tation, and evaluation of a system for the storage and retrieval of 
multiple choice test items. Since the Internal Medicine department of 
the Royal College of Physicians and Surgeons of Canada expressed a desire 
for the development of such a system the design was applied to their 
needs. The emphasis of this study, however, is the wider applicability 
of such systems in general education. A further objective of the study 
is that the principles underlying the programming techniques for reducing 
human error and increasing the useability of output, the file organiza- 
tion, search and indexing procedures, and the incorporation of available 
computer hardware and facilities at this university will provide a basis 
upon which other educators and administrators can build similar services 
for retrieval of large masses of data. 

The Department of Internal Medicine was one of the first medical 
specialties in Canada to use multiple choice questions for its annual 
examination for entrance to the Royal College of Physicians and Surgeons. 
This department also saw the need for preserving acceptable items for the 
development of a suitably large bank from which future examinations could 


be constructed. The problems involved in handling such a bank by doctors 
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3 
and their clerical staff constitutes an inefficient use of their time and 
effort. Thus the need was expressed for a machine-based retrieval system. 

Spring (1967) found a deficiency in the research literature concern- 
ing the application of information science in medicine: 

» . . every article on the subject either outlines the desired 

scope and then describes one or two subsystems in development, or 

describes a system so tailored to the situation in which, and for 
which, it has been developed that it has little applicability 
elsewhere . . . the literature is devoid of reports of the 
applicable, broadly useful, complex, debugged, effective, functioning, 
multipurpose systems so needed in medicine . . . most applications 

are truly in the research stage... . [p. 312] 

As shall be seen in chapter two this problem is not the only one 
encountered when one attempts to find an appropriate system that could 
be applicable to the need the doctors expressed. It was felt, therefore, 
that the task of designing and implementing a system for retrieving 
multiple choice examination items was most worthwhile. Not only would 
it serve the needs of Internal Medicine, but also other departments-- 
Surgery, Obstetrics and Gynecology, etc. There would of course be as 
wide a need and application in any educational department using similar 
examination questions. 

The particular advantage of using a computer for retrieval is 
pointed out by Baruch (1966). He feels that computers greatest assis- 
tance is 

. . . in those areas of medicine where the computer can act as an 

adjunct to the human in tasks that intelligent humans seldom do 

particularly well. The areas of sorting, filing, indexing, searching, 
and particularly of being alert for low probability occurences are 

the kind of ‘light thinking' that computers can do well and that 

intelligent people do poorly [p. 27]. 

It is reasonable to assume Baruch's conclusion applies not only to 


medicine, but to other disciplines as well, including education and test 


development. 
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Procedure Followed 


The design and implementation of this system was developed for use 
on the IBM 360/67 computer using Fortran IV programming language and 
syntax. The original bank of items was stored on a formatted tape but 
during retrieval executions was transferred to the direct access medium 
of disc through which a sequential search was made. However, since the 
system design is an integral part of this study it will be dealt with in 


detail in Chapter III. 


Limitations of the Study 


Information and retrieval systems operate as follows. Records are 
gathered and inserted into a collection, possibly with indexing to give 
some orderly manner. The user addresses a question to the collection and 
on this basis a search is made with pertinent records being retrieved. 

It is important to note that records are created and organized before 
the specific questions a system is to answer have been stated. That is 
to say, the system is created in anticipation of needs that are not 
fully known. 

A meaningful question a designer must ask then is, "Is it possible 
to devise an information storage and retrieval system that will conven- 
iently retrieve pertinent records in response to all possible questions?" 
Unfortunately it is not. According to Lipetz (1966), 

Not only is it impossible to create an information storage and 

retrieval system that will respond to all possible ques tions but 

also it would be prohibitively expensive to try to approximate 


such a condition. In practice all information systems and retrieval 
systems must adopt a more modest object [p. 178]. 
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Designing systems to satisfy unstated needs may sound impossible, 
yet systems are being designed to do just this. By extrapolating from 
past interests and trends, systems can be developed that give somewhat 
adequate retrievals. Indeed, this probably is the only rational approach 
to design analysis. 

With this in mind the following limitations were imposed on this 
study. (a) A sequential, and not a random or list file organization, 
(the explanations of which will be found in Chapter II) was used. (b) No 
attempt was made to program the computer to do automatic indexing; only 
manual indexing was incorporated. (c) A simple four level hierarchy 
determined the ranking of relevant output. (d) One iterative procedure 
was provided to improve user satisfaction. 

Since each of these limitations is an integral part of the system 
design (Chapter III) and because the decisions made for their incorpor- 
ation are made on the basis of the suggestions of other researchers 


(Chapter II) they are dealt with in detail in these later chapters. 


Definition of Terms 


There are no specific connotations of any terms used in this study 
which are foreign to the field of information science. However, because 
this study may hold some interest for educators who do not necessarily 
have a background in the computing field, some of the more basic terms 
in the vernacular of information science are presented which are not 


defined later in this text: 
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block--the number of records brought into the active memory of the 
computer; 


core--the active memory and workspace in a computer; 
data set--the media containing data such as cards, tapes, disk, etc.; 


disk--a direct access medium used by computer in lieu of tape, 
Cards ,wete., 


formatted--information specified to the computer in non-binary 
characters; 


logical record length--number of characters or variables in a record; 
record--that which is defined by one READ or WRITE statement; 


unformatted--information specified without the control of the 
programmer and contained in binary characters; 


information retrieval systems--systems which retrieve facts 
(measurements) in response to requests; 


reference retrieval system--system which retrieves documents or 
citations in response to requests. 


There are a number of other terms with which the reader may be 
unacquainted. An attempt is made, however, to explain those terms as 


the system design unfolds in later chapters. 
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CHAPTER II 


SOME PERTINENT LITERATURE 


Introduction 


The problems involved in assessing library automation literature are 
considerable. Some are due to the complexity of the library processes 
and the computer techniques being described. Others are due to the 
manner in which library automation literature is being produced and 
published, or for that matter--not published. A proliferation of formal 
documentation exists in addition to private channels of circulation. 
Markuson (1967, p. 255) maintains the reason for such communication 
channels is due to the fact that only relatively few people are engaged 
in library automation activites and that they are beset by continuous, 
often conflicting, demands on their time. Thus when a retrieval system 
begins to function, any documentation that results is often through 
informal channels of communication. A reviewer must be concerned, there- 
fore, with tracking down elusive items if he is to be up-to-date on 
current work. The use of Cuadra's three volumes of Annual Review of 
Information Science and Technology (1966, 1967, 1968) and the Associa- 
tion for Computing Machinery publications, Computing Reviews, Volumes 
7-10 (1966, 1967, 1968, 1969) are of particular help to a reviewer in 
finding relevant studies, and this reviewer acknowledges their contrib- 
ution/toehis* chapter. 

Besides the difficulty of finding relevant articles on library 


automation Black and Farley (1966) point out that: 
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almost none of them include system specifications, design 

specifications, design projections, personnel requirements or 

procedural manuals, or such important details as input card 
formats, tape formats, outputs, running time, costs, problems 

encountered, or solution thereto [p. 386]. 

Thus the lack of writing in this field at a high level of practical 
detail and concentration upon the use of equipment available to the 
institutions in which the work has taken place makes it difficult to 
assess the relative worth of the studies. Hypothetical installations 
still on the drawingboard may offer promise for the future but provide 
little help to the present attempts to design information systems. 

The format of the remainder of this chapter will be as follows. 
First, a discussion of the principles and practices underlying the theory 
of information retrieval will be provided. Following this, a summary 
will be given of the literature related to such design specifications as 
(a) file organization including sequential, random, and list files, (b) 
data coding techniques with particular reference to manual and automatic 
indexing and (c) associative techniques in file searching. Finally a 
review will be made of some of the more noteworthy retrieval systems 
already in operation in this field. As such they provide a backdrop 


against which an evaluation may be made of their relevance with respect 


to the particular design involved in this study. 


Principles and Practices Related to Information Theory 


Vickery (1965) maintains ". . . there is yet no unified theory of 
retrieval systems and a good deal of retrieval practice is still an 
empirical art, unsullied by theory [p. 399]."  Saltzbery (1963), however, 


tries to delimit the area by dealing with three underlying principles of 
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the quantitative aspects of the efficient storage and communication of 
information--the measure of information, the storage of information, and 


the communication of information. 


Information Measure 

Although information theory is essentially a mathematical subject, 
a basic understanding of the underlying principles can be acquired 
without resorting to complex mathematical arguments. 

Saltzbery (1963) defines one bit of information as the amount of 
information necessary to resolve two equally likely alternatives. If 
the uncertainty is greater, then the amount of information necessary to 
remove it is greater. It follows that a message which identifies one of 
eight equally likely alternatives contains more information than a 
Message which resolves only four equally likely alternatives. 

Consider for example, a simple game in which one is asked to guess 
a number between one and eight. With no a priori knowledge, the proba- 
bility of correctly guessing is 1/8. In the language of information 
theory one would ask: how much information is needed to resolve the 
receiver's uncertainty assuming one only asks questions to which the 
answers can only be yes or no? In this example the minimum number of 
such questions is three. Is the number greater than four? If the 
answer is yes, is the number greater than six? If the answer is no, 
is the number five? If the answer is no one has resolved all his 
uncertainty: the number must be six. Hence the receipt of one bit of 
information reduced the probability from 1/8 to 1/4, two and three bits 


of information reduced the probability to 1/2 and 1 respectively. 
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This illustration serves to introduce one to the basic concepts 
from which the quantitative definition of information can be obtained, 
namely, the number of equally probable states of a system (here eight), 
the number of alternatives resolved by each question (here two, because 
of the binary nature of the question) and the minimum number of questions 
necessary to determine the state of the system (in this case, three). 
Thus the relationship is defined as 2%=8, In the vernacular of informa- 
tion theory one would say that three bits of information are necessary to 
determine the state of such a system. That is, three appropriately 
chosen questions, each of which resolves two alternatives, reduces 
indeterminancy to certainty; the problem of choosing appropriate 
questions is analogous to that of choosing an appropriate code which for 


this study will be dealt with in Chapter III (cf. pp. 41-46). 


Information Storage 

The problem of storing information is essentially one of making a 
representation. This representation can take any form as long as the 
original can be reconstructed at will. Therefore, one simply has to 
ensure that every possible event is recorded and can be represented in 
the information bank. The capacity of the bank simply is the total 
number of listing states which it will admit. This topic will be dealt 


with under blocking factors in Chapter III (cf. pp. 74-76). 


Communication of Information 
Now let us consider information theory as it pertains to the communi- 
cation of information. If one assumes information received is the 


difference between the state of knowledge of the recipients before and 
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after the communication, then one can understand Saltsberg's (1963, 
p- 11) more precise definition. 

I = log Pa , where I is information received, Pa is the 
probability of an ae at the receiver after a message is received, and 
Pb is the probability of an event at the receiver before the message was 
received. For example, in receiving a message regarding the sex of a 
baby, where the receiver does not know its sex, Pb=1/2 and Pa=l. 
Therefore, I=log 1/, jo=10g 2-1, thathiseeimbit of information. 

However, as Gove (1957, p. 7) points out, all that is being 
Measured in such a case is the number of binary questions (one's and 
zero's in a computers language) and not the amount of understanding. 
Hence, while information can be measured, its unit being the bit, the 
question of whether or not meaning is measureable so far remains 
unanswered for want of an acceptable unit of measurement. However, 
Abramson (1963) maintains the semantic aspects of communication are 
irrelevant to the design problems of efficiently storing, retrieving 
and sending information. 

One further aspect of information theory should be considered, that 
of information language, which is not a language for programming informa- 
tion retrieval systems, nor is it a query language for interrogation of 
the system. It is, however, a language used to represent the content of 
a document. Soergel (1967) enumerates the requirements to be met by 
such a language: (a) unambiguous, (b) flexible, (c) in natural language 
or transformable to it, (d) adaptable to the subject matter, (e) capable 
of detecting incompleteness or inconsistency, (f) capable of encompassing 


the scope and variety encountered (g) meeting the requirements of 


oP aoa | a +e | 
ed? ab 81 ,bevisost aottamit i , Reva i“ 7 


, bev Leoet @i sgeceom 5 YSTth “ahd ab cite 


264 agcenem ei? stoted tevicoor si't Jag 

6 ‘io xes old goibusget ogse2esm 5 gai i908" 
[esS Bae @\i-dd .aees 2ti wom 

spoliseavotni to sid i, at 

pons 2’ 30) enetresua. yssrid to 

sacbbrsicisoa to tnvonms sit go ri 


ant ,tid of? anisd tim ett»,be 
my 


enisgiwet 46%. ce sf _doosvesem sh -gnaasan fou wo stoitedw to Smet, 
_ vevenol- .tasde1essm to via aldstqeans o6 to tasw tot bet 


16 taoltaummes 2 etosges oksnomee eit eaistaiam (€3eL) . 


iveigtex ,aaisote gisasrolae 2 emelidong mgtesb oat os suet 


,coltsonotai yatbaee 6 i 
: ’ _ 

jody , be webbanoos sé biuone yiosdt ae io tosqes sath 
-emroini gokmmsigo%q aot egsugnel & tom saps vogeugasl + . 


to aoltsgorwretn’ tot sasugasl yrsup' 6 ad bua -emeteye 


ate ae fanvgso 


Sy6uUG 


a2 
automatic processing equipment, and (h) capable of minimizing the time, 
cost, and effort requirements of processing. 
Information theory thus provides us with certain principles for 
analyzing and improving the design of storage and communication processes. 
Thesesprinciplesewill besreferreditoyin thesdesign of this system 


(Chapter III) and its evaluation (Chapter IV). 


File Organization 


The literature (Dodd, 1969; Salton, 1968, pp. 243-252) gives two 
main reasons for the utilization of distinctive data files. Firstly, 
the demands of users differ. Each type of demand may call for different 
types of file organization, for example, whether or not rapid storage 
is necessary, how thorough a retrieval is required, time limits imposed 
on updating the file, need for holding overhead cost at a minimum, 
nature of output, and so forth. All these factors influence the type of 
file the user will choose. 

Secondly, the characteristics of the medium in which the data is 
stored need not necessarily be compatible with the demands of the appli- 
cation. Hence, different file organizations help to compromise between 
user requirements and the physical limitations of the user's computer 
hardware, such as amount of core storage, number of data sets available, 
capability of handling processes, and so forth. 

Basically all data organizations are built on three types of files 
--sequential, random, and list. Since file organization forms the heart 
of an information storage and retrieval system, this reviewer has felt 


it necessary to review in some detail how these three files work. The 
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13 
reader is referred to the manual produced by IBM (1966) on direct access 
Storage devices and Dodd's article (1969) on "Elements of Data Management 
System" for a more thorough and more technical explanation. It is 
imperative for the reader to understand the data organization methods 
available in order to understand many of the aspects of the current 
problem as discussed in Chapter III. Furthermore, such an understanding 
provides a basis upon which one may evaluate the system used in this 


study, a discussion of which is provided in Chapter IV. 


Sequential Organization 
This method, which is also referred to in the literature as Direct 
File Organization, is undoubtedly the best known. Records are stored in 
positions relative to other records according to a specified sequence. 
This sequence may be in order by document number or other common attri- 
butes, or records may simply be in the order of their arrival in the 
bank. In any case items to be retrieved are identified by a sequential 
scan of the complete file. The ramifications of using such a file are 
varied--some advantageous, others imposing restrictions. 
Salton (1968) states: 
» « . if the information is to be retrievable according to a variety 
of different keys--for example, subject identifiers, year of publi- 
cation, author name, publication, and so on--the direct file system 
is often the only practical one, since it is not usually possible 
to store many copies of the same file to account for the various 
desired file orders [p. 244]. 
The importance of this point made by Salton has very direct implications 
for this study and the reader is urged to keep it in mind in the reading 


of subsequent chapters. The response time for sequential file searches 


is necessarily slow, however, since a complete file scan is generally 
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14 
needed before any information can be retrieved. In a sequential file 
search the first key is examined; if the key is not correct the next 
record is examined and so on, until the correct record is found. 

Updating files in direct files is also disadvantageous. If a new 
record is shorter or longer than the original record, parts or all of 
the adjacent records would likely be destroyed when the record is rewritten. 
Even more inconvenience is encountered in updating blocked records. To 
do so is impossible unless the entire block is rewritten. 

It is obvious, therefore, that the rewriting of sequential files is 
usually done by copying records from one data set to another as needed. 
This is necessarily expensive and therefore users would only resort to 
this procedure when a number of records are to be altered. 

Another difficulty is encountered if one attempts to insert new, or 
remove old records from such a file. An insertion requires that already 
stored records be "pushed apart" and of course the converse is true 
for the removal of old records. New records of course could be added 
out of sequence to the end of a file and sorted later into proper 
sequence. Such a process again leads the user to copying the entire file 
onto a new data set, such as a new tape. 

The difficulties encountered in the above data organization no 
doubt led system analysts to develop a file which could eliminate some 


of these difficulties. Such a file was called random organization. 


Random Organization 


In this system, records are stored and referenced on the basis of 
the relationship between the key of a record and the direct address of 


the location where the record is stored. This latter address is used 
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when a record is stored and used again when the record is to be retrieved. 
Three methods are generally used for accessing records--direct address, 
dictionary look up, and calculation. 

Direct address is used if the programmer, knowing the precise size 
and number of records in his data file, is able to supply the direct 
address at storage and retrieval times. This address is then used to 
access a record on storage media. 

With the method of dictionary look-up a record's direct address is 
obtained prior to storage or retrieval with both the record's key and 
its direct address being stored in a dictionary. When a record is stored 
or retrieved the key is found in this dictionary and the corresponding 
direct address is used. For example, if the key, "RENAL," is compared 
with the dictionary and the direct address 1557 is found, then the direct 
address 1557 is used to store and subsequently, if necessary, to retrieve 
the record whose key is "RENAL." 

The use of a dictionary insures that each record has a unique 
address. However, to achieve this the dictionary must be large enough to 
include all potential direct addresses and it may occupy as much space 
as the data itself. Also the step-by-step sequential search of a 
dictionary may offset the advantages gained by unique record addresses. 

The third method--calculation--involves converting the key of a 
record to a direct address. This procedure, however, does not necessarily 
insure that the address is unique. For example, each letter of the key 
"RENAL" could be replaced by a number--R by 18 (because R is the 18th 
letter in the alphabet), E by 5 and so on to L by 12. The sum of these 
numbers is now the direct address of the record whose key is "RENAL." 
However, the same direct address would be obtained for the key '"LANER." 


This “overflow" is usually handled by pointers; if the retrieved record 
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with first address is not the desired record, then the pointer with that 
key is used to retrieve another record having the same calculated 
address. This sequence is continued until the correct record is found. 
This principle is used in the IBM 1500 system for retrieving course 
material. 

Bleier and Vorhaus (1968) found the use of this type of file organ- 
ization had the following advantages. Queries are retrieved rapidly 
since one could operate upon the list of addresses of records to access 
relevant records. Since only a small portion of the complete file is 
examined there is certainly more efficiency in terms of time in this 
system as compared to sequential files in which the complete file is 
searched. These researchers also found that the size of a data base has 
very little effect on the speed of retrieval. 

However, Bleier and Vorhaus also point out the disadvantages 
encountered in a random organization. Firstly, there are increased 
storage requirements to handle the list of addresses in core. Therefore, 
one must weigh the cost of core as compared to the cost of time to 
determine optimal economy of usage. Furthermore, they found a signifi- 
cant increase in the complexity of maintaining the system. This is not 
a surprising conclusion since one can see that the programming ramifica- 
tions would be much greater than in the case of sequential files if one 
had to program the handling of overflow problems or the manipulation of 
large unwieldly dictionaries. Dodd (1969) points to another disadvan- 
tage. 

Although random organization does allow for rapid access of a 

particular record with a known key, it is not suited for rapidly 

accessing a number of records. This limitation is imposed by the 


time taken by the hardware access mechanism to locate a record 
[p. 122]. 
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One further notation that Dodd (1969) and the IBM (1966) manual 
point out that is pertinent to this study is that all records used 
under random organization are generally of a uniform length. The 
implication of these limitations will be discussed in the first section 


of Chapter Iii (Cefiepp. 31-36). 


List Organization 

The use of pointers in the calculation method of random file 
organization leads one to the third main file organization, that of 
list files. In sequential organizations the next logical record is the 
next physical record. In list organizations, however, a pointer accom- 
panying each record serves as an address to the next logical record which 
may or may not be the next physical record. There are three basic types 
of list files--the simple list structure, the inverted list structure, 
and the ring structure. 

Simple list structure is in practice a sequential file that may, 
for example, have physical records at locations 23, 59, 117, 1105 but 
which may be sequentially, that is logically, retrieved together by the 
use of pointers. Initially there is a pointer to the first record 
number 23, that record points to number 59, and so forth. If a record 
is to be updated, records can be placed anywhere within the list by 
changing the pointer of the record preceding the new record and inserting 
a new pointer in the record inserted. Conversely, the removal of an 
item requires changing the pointer in the preceeding record to point to 


the record following the one deleted. 
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There are, however, serious limitations to this type of data 
management. Firstly, it is a rare retrieval system in which items belong 
exclusively to one category and have no relationship to other categories. 
Indeed, as shall later be seen in the section of this chapter dealing 
with operating systems, most users demand, and systems incorporate, means 
whereby items can be ranked according to their degree of association 
with not one but a number of categories. To use this facility in the 
simple list structure requires that each record have more than one, 
possibly many, pointers to and from it. Each record must therefore 
become a member of many lists. A deletion of a record which is a member 
of several lists then becomes immensely more difficult since one must 
find the preceding records of all lists that point to the record being 
deleted. Furthermore, extra pointers must be stored so that the 
preceding as well as the next record of the list may be found. Similar 
complexities, of course, are encountered for any additions of records. 

Inverted list structure is one in which the restriction on list 
length is taken to its ultimate conclusion, that is, the list length is 
restricted to one, and each key appears in the index. The index thus 
points directly to the record requested and no further pointers are used. 
Hence, the list has become inverted, a condition which lends to the name 
applied to this procedure. The reader will see that, in practice, such 
a file is much the same as a random organization using the dictionary 
look-up method. Like that procedure, inverted files have the similar 
advantages, that is of providing relatively efficient access to all data 
and of being suited to retrieval requests which are less predictable 


than specific. Similarly the two procedures share the same disadvantages, 


Tite uisisl es aint 3 nent peste vaaoahies 
eusem ,steroqivsnb-emsseye brs. patel craton ,omareye 
noiteinosen to sorpeb tkehit oF gnibresos SoxaR sd feo —s tom oS : 
arte o- ytiiises cine SR OT _ sektegeréo Jo tedimua & cane a 
ojo. euitt suse ovsd bnbset Pines GEM Rewtapsts oswrpse Fels elgmia © 
sum OTC >t MORE, aoe hated ot exetriog . nem xidiewog 


cavigaw © ei triny brocoy B60 wditeleh & ,etell yee. Bo rsdmem 5 saooed : 
scum suo somke tiv otViihy e¥om Uiseaimal eetoved ais s2ekl Isreyer te 5 


en 


Soited brooy? st oF arog Tene atelt fle Yo arrose gnivsserq sit bill. 
ij tsilt sa bevoye od teym eyettalgg evtne ,crceredsat .bessieb | 
i a 
pi _— , : a 
fimbe: ..Pouet oa. Asa sekl ait topped Jxem ody tn Llow as-gaibsostq : 


-- +o agoltiibsa yas lot boyakalesas evs ,eeeweD 29 -zeitixeiqmos — 


jell ga) noitoixies:. att doin oak sae ee wmyourte seit hexzzgval : 


al rtgael taki pat ,2l) cerns ,notepienos Steweiio sri of noded ef degasl 


ff 


-ytit xsbHi. ed? .xobak sitink ceséqgs\ Yet dose ire <a OF EEO 


cay Sib etetalog sédfeu2 on bat Hstesepes bioer ets oz yitoerib satog 


aman siiz-~ot sbuel roidwinoisithnos « sbermsvad etooed ent sell off. eonatt 
dave a ni oe ssa Liiw sebset alt .erubsserq piaeweds. - 


Wxenoitoib. ost amieu nolsesiitegas mob 26 sabe ett downs ee ,* 


_ aisimic oat mee oy eran <seaiaboric 40b ones item qu+xo c 
e386 ae: ilies tnak | ‘g ; 


“% Le 
bd * ‘ os 


aie 


19 
the main one being the requirement of a large dictionary to be stored 
in core. 

Ring structure is an extension of list organization, but instead 
of the last record in a list being given no further pointers, a pointer 
is made back to the first record. This first record in a particular 
list is given a special symbol to indicate that it is the first record 
in that ring. Furthermore, all records pertaining to a given ring list 
except this first record have two pointers--one to the previous logical 
record and one to the first record. It will be recalled that records 
may be included in more than one list structure. The same facility can 
be used here also. 

Such ring structures may prove to be very powerful as they provide 
a facility to retrieve and process all records in any one ring while 
being able to branch off at any or each of the records to be retrieved 
and process other records which are logically related. Such other records 
would also be stored in a ring structure and in turn permit the same 
facility; this nesting is carried through to any level required by the 
logical relationships. However, at any given record in any given list 
there are two pointers, one being to the previous logical record, the 
other to the first record of that ring. Therefore, having obtained one 
record at a given hierarchial level one would also be able to retrieve 
as many records, with the same hierarchial characteristics, as were in 
that given list. 

To illustrate the use of this procedure, assume a record is 
retrieved by the key "Allergy." If a record is found and retrieved by 
this key it is not possible to assume it has any other desired character- 


istics one may wish it to have. Therefore when a key is found with 
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"allergy" in the ring list of specialities, this provides a branching 
off point to search through a ring list of allergies. For example, if 
one wished to retrieve records that not only dealt with allergy but also 
immunology then the ring list of allergy would be searched till the key 
immunology was found. This hierarchy could be carried to any desired 
level and an appropriately desirable record retrieved. If more records 
were needed with the same characteristics the pointer of the retrieved 
item would be used to indicate the first record in the last nested ring 
that was searched. It is obvious that the elimination of searching all 
previous nested rings in the search would certainly facilitate the 
efficiency of retrieval in terms of time. 

However, it is also obvious that the complexity of the ring 
structure imposes no small task on the programmer in originally setting 
up such a system. Furthermore, the problem of updating one's storage 
file would be almost as complex as the original program. Added to this 
is the costly disadvantage that such an organization requires added core 
space in order to handle all the pointers. 

In concluding this section on file organization it should be noted 
that there are even more complex data structures based on the three main 
organizations of sequential, random, and list. Multilist, cellular 
multilist, indexed sequential, and tree organizations are but a few of 
the somewhat exotic techniques that have been developed. These techniques, 
however, are considered to be too costly and perhaps impossible to 
implement at the present time at this university. The reasons for this 
will be discussed in the first section of Chapter III. In all cases, 


however, one must weight, as Salton (1968, Ch. 7) has suggested, the 
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advantages and disadvantages of each file organization and also try to 
chose that data management design that is most suited to the demands of 


one's particular needs. 


Data Coding Techniques 


Data coding is a term applied to the transformation of data represen- 
tations meaningful to the external world into a mode more suitable for 
machine processing. It is usually done to reduce space needed to store 
data or to provide a more suitable statistical distribution of terms in 
the store. In most information and retrieval systems, some sort of 
comparison is made between a user's request and these coded records. A 
certain degree of favorable comparison will retrieve that record for the 
user, hopefully meeting his needs. However, as it has already been 
pointed out (cf. p. 4) no retrieval system is able to be all things 
to all people. From past interests, however, reasonable objectives can 
beset. 

Once these objectives have been determined, a match strategy can 
also be specified. This strategy, so chosen, will in turn determine how 
one will code his data. In general there are two widely used techniques 
for coding. The first is an a priori approach in which records entering 
store have been manually coded. The other is to turn the indexing over 
to the computer for automatic coding. These two techniques and the 


ramifications of their use will now be discussed. 


eicte ot Sebesn sosge Sosa oF 
al emer to noivadixvrerp St ae abivesq.03 90 mga 
to Ftot anos , eevee Lovebvver bag, fo lremetm teom al -onota sd? 
A sivsopet’ babes sabi? brs pasugae a'raau 6 goawied sbam ei noabtsqmo2 
xot-b¥espx TET svaktayor Lite nosinnqmos aldsscvet to serged cists. 
nsod ybaotis’ aa ti Ss - TavewoH “bean add gaitves ylistegod , teu 
atint Lfé.ed of side ero iegeye Lsveiates on (e sq 22) gun betaleg 
ten BSc er attepihhn  rovowod  etsstetat taesqemodd .elqoeg ile of 
| ten ed 

iso YReteite dotem(s ,bantnressb mead sved eavissetdo exerts so00 
vod sHitmersb Raves abi fliw (nseolo ce xpatease/ ent _ vbeitiosge ed ots 


esupindser bate qisbiw owt si16 Stes ietensg. al ated eid aboo £Liw eno. 
oe fptitn. af .doso7qqa isciag . odT chee 


ft owt sesdT .gndpen. offs OS 


22 
Manual Indexing 

Lipetz (1966) points out that "Satisfactory comparison . . . requires 
the ability to recognize the important features in the word. This is 
not an easy task to turn over to a machine [p. 177]." Abelson (1968, 

p- 419) agrees with this point of view, emphasizing the need for human 
judgment in information retrieval. He feels that professionals in 
individual fields of scientific research are essential custodians of 
knowledge who cannot be replaced by archives of any kind. It follows 
that those who feel this way would have more confidence in a system which 
retrieves records which were indexed manually before entering core. 

The manual indexing procedure usually follows three basic steps. 
Firstly, the intended user and system designer decide upon criteria 
needed for retrieval. Secondly, a code is made of these criteria, and 
thirdly a user who is a specialist in his subject area, transcribes all 
records that will be put into store according to the code of the criteria. 
Later the user simply specifies which particular criteria he is looking 
for in a record and a search is made of the coded records to retrieve 
pertinent records. Altmann (1966, pp. 154-157) describes such design in 
the implementation of his system. 

Indexing performed by trained indexers is extremely detailed and as 
such it is on the whole superior to automatic indexing. The difficulty 
of such indexing is the time and cost of indexing by a person who is not 
only trained in indexing but also a specialist in his own subject field. 
Though he would be required to code the original data and all updated 
versions, such a person would not be gainfully employed by indexing alone 


unless the bank was very large. Hence, such procedures require the 
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finding of a person who is interested in working in interdisciplinary 


fields. This of course is not always easy. 


Automatic indexing 

Since the advent of the modern computer, system designers have 
always dreamed of systems which would eliminate almost all human effort. 
Thus, attempts naturally were made to enable the computer to take over 
the task of indexing. The automatic indexing procedure follows only two 
basic steps. Firstly, the intended user and system designer decide upon 
criteria needed for retrieval; this step is the same for both manual and 
automatic indexing. Secondly, the programmer must write a program to 
assign indexes to records. This is usually accomplished by some kind 
of word matching; for example, author names, titles, phrases, citations, 
and so on are searched by the computer and if a match is found with words 
written into the program, the computer is instructed to convert the 
record to a code for more convenient use. Thus a user simply specifies 
which criteria he wants a record to meet, for example, "neurology, McGill, 
graduate." The computer would convert these specifications to the same 
code as used for the records in store and then search for a match. 

There are, however, serious limitations to such procedures. First, 
there is the problem of synonyms. The programmer must foresee the 
possibility that records dealing with "cats" belong in the same cluster 
as records dealing with "felines." He must also be able to accomodate 
for users who may not use either of these words but may instead specify 
"oussy" op "kitten." It is likely that such-a user would be interested in 
the records dealing with "cats" or "felines" but if the program has not 


converted all words to the same code he will not get such documents. 
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The inclusion of a synonym dictionary not only presents the problem of 
incomprehensiveness and programming effort, but also that of requiring 
more core space. As one uses up space for the program to index records 
and user requests, space is lost that could be used for blocking records, 
pointers, or similar features in one's file organization. Thus the use 
of automatic indexing necessitates large and expensive memories, excessive 
programming, and slower operation--if it can be done at all. Certain 
authors (Hammond, 1964, pp. 237-293; Wallace, 1964, pp. 225-235) also 
point to the difficulty of analyzing phrases and syntactic relationships 
in automatic indexing. They found that different subject areas not only 
had a unique vocabulary but that different habits were found among 
writers in using the most common words. Thus variation in subject field 
and variation in style of writing imposes serious restrictions on 
adequate automatic indexing. It seems, therefore, that Lipetz (1966) 
makes a valid claim in stating '". . . when concepts buried in multiple- 
word phrases must be recognized . . . the human specialist is still quite 
able to compete with the computer [pp. 155-156]." Garfield (1964, pp. 
189-192) also claims that considerable standardization of stored documents 
is necessary before automatic indexing becomes adequate and this, he 
feels, is unachievable for many years to come. 

Salton (1968, p. 345), however, has found in his work that automatic 
processing was not substantially inferior to manual coding. There ie 
also the great convenience of eliminating the manual indexing step which 


is time consuming and costly. 
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The resolution of the question as to which is better will probably 
not be found until as Jones (1969, p. 32) has suggested, further experi- 
ments are carried out for the comparison of manual and automatic thesauri, 


thereby establishing unequivocally whether one is better than the other. 


Associative Techniques 


Tinker (1966, pp. 96-102) has shown that as more descriptors or 
indexes are assigned to a document and are required in a request, the 
more difficult it is to retrieve an item. Furthermore, it has been 
generally found that most systems stores rarely are able to give the 
user the items with the exact characteristic for which he is looking, 
even when limited descriptors are used. As a result, statistical 
association techniques have become widely used as a means of increasing 
the number of relevant records retrievable in response to a specific 
search request. It becomes possible, therefore, to retrieve not only 
those items which are an exact match of the criteria specified, but also, 
if needed, those stored items that meet most, but not all, of the 
criteria. 

Doyle (1964, pp. 15-24) retrieves items on the basis of a ranking 
hierarchy, where items meeting the most criteria are obtained first, 
then successively those items that meet less and less of the criteria. 
Other designers (Wittman and Ingerman, 1967; Mathews & Thomson, 1967) 
use a threshold selection technique in their systems. Consecutive 
integers are assigned as weights, commencing with unity for the least 
perferred term, with a minimum value specified for a stored item in 


order for it to be retrieved. If the sum of the weights of criteria met 
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by a given record exceeds this threshold score it is considered pertinent. 
Salton (1968, p. 140) presents experimental evidence that the use of 
these weighted identifiers is always more effective than methods 
retrieving only records of exact matches. 

Some researchers (Edmundson, 1964; Kuhns, 1964) use correlation 
coefficients as a means of ranking the retrieved items. A binary vector 
indicating the criteria is correlated with a similar vector for the 
stored item. Records are then ranked according to the size of the 
correlation coefficient and are retrieved in rank order. 

Some ambitious attempts have been made to implement procedures that 
will retrieve items that do not belong exclusively to one category. 
Multi-dimensional approaches such as discriminant analysis by Williams 
(1964) and factor analysis by Borko (1964) have been developed to 
determine the degree of relevance of a given stored document with respect 
to more than one category. 

Some of the most successful systems try to improve user's satisfac- 
tion with his retrieval system by providing iterative steps. Salton 
(1966, p+ 345 )ehas found that, in actual practice, users have’ many 
differing needs, some wanting very exhaustive answers and others being 
content with a single reference. One of the easiest ways to adapt to 
these various needs is a provision in the system for more interaction 
between the user and the machine. Many (Lipetz, 1966; Parker, 1966; 
Bryant, 1966; Licklider, 1965) suggest that results of initial searches 
should be given to the user. He then modifies his request and a more 
refined search is made on the basis of these needs. This iterative 
procedure, though somewhat slower, no doubt offers more satisfaction 


to the user. 
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As an extension of, and as a means of improving the time lag of the 
iterative approach, promising investigations (King, 1968; Rubenoff & 
Bergman, 1968; Belz, 1967) are being made with the so-called interactive 
systems. These systems, incorporating on-line terminals, are designed to 
improve search and retrieval by reducing the time between iterative 
searches. The user is provided with a terminal (for example, a cathode 
ray tube for display of retrieved items, and a typewriter for user 
responses) with which, in one sitting, he is able to see the results of 
his first request and make adjustments to his requests on the basis of 


information received, continuing the iteration until he is satisfied. 


Some Significant Retrieval Systems 


A review of the literature would be incomplete without a description 
of some of the more advanced and well known retrieval systems. One of 
the more widely accepted systems for retrieving documents has been the 
KWIC (Key Word in Context) system. Input is either running text, 
abstracts, author-title, or keywords. The system is generally used to 
index documents by the words in their titles. The computer reads all 
the words in all the titles and alphabetizes these words. Then it prints 
out these words for all titles in alphabetical order on successive lines 
but keeps with these words the content of the titles and a code for 
locating the full text. No attempt is made to associate synonyms so the 
user faces the task of searching for synonyms himself. To eliminate 
useless printing and searching, the system can be instructed not to print 
entries for common place words such as "and," "the," etc. The journal 


Chemical Titles is such an index and serves as a alerting service for 


aborisso 8B ,slqmexe tod) nea reeeraers oct samesoe 
ane ii rsetixwagqy? 6 bas 4emett bevebates to atelkieeeaeaeen 

bo etivess sdt sco ot side eb St .gataete ome ak .doidw tiv (epenogeer 
to ciesd sft no cteouper eld of ataemempbe edem bas Jesupet text? sid 
.beiteitaz 2i ad Litas neissisth sd? galunkzoos ebeviese solasanotat 
1.7 6a 

guts 78) sLsteh re! . a ad 

a 

neltginpash 6 tuedtiw stTolgqmopal ad bivow sauteqetil eds 20 weivet A> 
Yo 900 .emeteve Leveinitor awond [fow Doe beonevbs) exon, ods 20 amos 70 
sit need 2a stheatinuts sutiesbrbies not exeteye betgscoos ylsbiw stom dt 
¢ixet goinnus weitis ei yeqal .«mstaye (3xemmod mi brow yer) om 

oy beer yileteasg 2: msteve sft .abnowysd xo ,sieit-16dIum Btoarseds 


iis absey xesuqmo sdT .2sitiy tied ak abxow add yi etaemupob xebat a 


etntna Si nei? .abtow seent essitodsrigis bas vals erie Loe ak ebtow edz 


eenil evizzasoue 10 tsb10 ispitedsdqia at eelsit ifs a0% absow saetl? 300 


x0? sii6a/ bre wal2i4 oz to taosnes ods sonow seeds mrbw pqeeslaat 
oct os emynomye. steineeas oF ahem ci iqmmtts of a 
srontalio oF Asami aaiomes 10% yeidansae 3 Jena im | 


& 


iu 
‘ 


; 


- 


28 


chemists regarding recent articles that have appeared in selected journals. 
Many researchers (Benson, 1965; Stiles, 1965; Sage, 1965; Sprague 1965; 
Stewart, 1966; Stevens, 1964, p. 283) incorporate many of the basic 
principles of KWIC in their systems. 

The limitations of such a system are, of course, not only the limita- 
tions associated with any automatic indexing system but also those imposed 
by the lack of a synonym dictionary. 

Since this study is specifically related to medical applications, 
the MEDLARS (Medical Literature Retrieval and Analysis) system will be 
specifically noted. MEDLARS is a computerized information retrieval 
system for use by the National Library of Medicine, its design and 
implementation being carried out by General Electric, using similar 
techniques as KWIC. An entry into this system consists of a citation and 
its associated tags, described as a unit record. Both periodical and 
non-periodical articles may be a unit record. Kent (1966) maintains this 
is the most sophisticated and most conpletely automated medical library 
system available. However, Cookley (1967) points out that many of the 
titles of articles were often inadequate in reflecting the content of 
the article. Minker (1969) also criticizes the system in terms of its 
failure to reject irrelevant material. Therefore, despite its sophis- 
tication, MEDLARS still has the inherent weaknesses of many other 
automatic indexing systems. 

One of the finest retrieval systems in the world today is the SMART 
system. Implemented on both the IBM 360 and 7094, it is essentially a 
laboratory for retrieval principles and procedures. Doyle (1969) refers 
Dom Gedo eee CUnsce se Orce) 11 experimentation in documentation 


area the like of which is seldom seen [p. 271]." 
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Salton (1968, pp. 9-20) describes SMART as a fully automatic text 
processing system which manipulates documents and search requests, 
expressed in natural language, and produces the documents that appear to 
be most similar to requests. The system is characterized by several 
hundred different content analysis procedures, all available to generate 
indexes to records and requests, including word matching methods, stored 
dictionaries to lessen the effect of vocabulary variations, statistical 
and syntatic procedures to identify relations between words and concepts, 
and phrase generating methods. Thus a means is provided for attacking 
the content analysis process from a number of different viewpoints, each 
of which provides a somewhat different output. It is possible, therefore, 
for a search process to be conducted in such a way that search requests 
producing unsatisfactory results can be reproduced under altered 
conditions. The new output can be examined and, depending on require- 
ments, further changes can be made until a satisfactory retrieval is 
obtained. 

The user is also requested to identify those records considered to 
be most useful. The system then automatically adjusts the search 
request by increasing the weights of the requested terms that were also 
contained in the designated set of relevant documents and decreasing the 
weights of those not relevant. Effectively this process shifts the 
request vector so that it lies closer to the relevant document subset 
than to the nonrelevant subset. In this manner, similar future requests 


will presumably receive only the most pertinent records. 
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It would seem that this system has much to offer other designers 
in the structuring of their data management. If one were able to use 
the system for his own records he could compare which method provided 
the most pertinent records for the user and then incorporate that 


automatic indexing procedure into his own system. 


Summary 


This chapter has reviewed some of the literature of information 
retrieval as it is related to theory, system design, and data management. 
One can summarize the implications of these readings as follows. 

There is certainly a need for the system designer to report his 
work in a very thorough, practical manner, noting all relevant specifica- 
tions, problems, and evaluations of his work. Furthermore, his design 
should be analyzed in terms of the principles underlying the theory of 
information retrieval. This not only provides a basis for contributions 
in the field but also provides a frame of reference for evaluation. 

The designer and programmer should also try to meet Soergel's (cf. 
p. 11) requirements for the programming language used in order to insure 
its maximum facility. The proper choice of a file organization is also 
necessary in order to meet the demands of the intended users as well as 
the available hardware. Close association with the intended users also 
facilitates the designer's understanding of objectives and expectation 
and thus enables him to choose the appropriate indexing procedure, 
implementation of a practical retrieval iteration, and adequate 
hierarchical outputs. 

These implications as related specifically to this study will now 


be discussed in Chapter III. 
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CHAPTER III 


STORAGE AND RETRIEVAL DESIGN 


Introduction 


This chapter confronts the problem of designing a specific system 
for handling multiple choice questions. In attempting to do this, 
Chapter three is divided into two basic units. The first unit deals with 
the generalized constraints and specifications that are expected to be 
encountered in designing retrieval systems for multiple choice items. 
The second unit deals with the implementation of this design, specifi- 


cally that relating to medical documents. 


Constraints and Specifications of System Design 


Introduction 

Before any information retrieval system is implemented there are a 
number of design constraints and specifications that must be considered. 
These are not exclusive to this study but apply to all information 
systems. Firstly, the designer must consider the nature of his data 
base, which invariably is determined by the discipline using the system. 
Secondly, the choice of file organization depends upon the available 
hardware and software. Thirdly, the designer must provide a system that 
is complatible with the human user. Lastly, the nature of a design will 
depend upon the available supporting resources, which includes not only 
manpower but also finances. The nature of these constraints will now be 


discussed in detail. 
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Data Base 

Since the data base being considered is multiple choice examination 
questions, a designer must consider those constraints which are related 
to education and measurement. Multiple choice questions vary a great deal 
in their length; therefore the system design must provide for variable record 
lengths in retrievals. Furthermore, educators usually do not file only 
the text of an item, but also tedode information related to the item's 
subject matter, taxonomic level, correct response location(s), number of 
items used, year in which it was last used, the examination in which it 
was last used, the source of the item, its difficulty level, and biserial 
correlation. Some educators now provide audio-visual material with 
their questions, and this source is recorded. A design for storing and 
retrieving multiple choice questions on computer must therefore accommo- 
date not only documents (the text of an item) but also measurements (an 
item's descriptive indexes). The design must incorporate the features 
of two standard retrieval systems--information retrieval and reference 
retrieval (cf. p. 6). Since educators most often choose items for an 
examination on the basis of their descriptive indexes the designer must 
also provide codes for these indexes either manually or automatically. 
Most of the descriptive indexes mentioned above are almost impossible 
to code automatically. Biserial coefficients and difficulty levels can 
only be known after an item has been used and analyzed, the procedures 
for which are entirely separate from retrieval. Similar problems exist 
for coding the number of times a question is used or in what examination 
it was last used. Taxonomic levels are most easily determined by the 


educator, not a programmer. Such conditions make it necessary for manual 
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coding to be used in lieu of automatic procedures for a design handling 


multiple choice items as its data base. 


Hardware and Software 

Available computer facilities and the programming language used to 
execute these features influence the design of such a system. At the 
University of Alberta the IBM 360/67 installation provides each user with 
200,000 bytes of core storage, a relatively large space which would 
accommodate the large dictionary files used in random and list file 
organizations. However, the hardware facilities at this installation 
have not been completely debugged for handling these two latter files. 
Sequential files, on the other hand, do not present such problems and 
are therefore the safest to use. 

Fortran IV programming language imposes no inherent constraints on 
the use of a sequential file and is therefore appropriate for such file 
organizations. This software does require, however, that alphanumeric 
characters be read under A (alphanumeric) format only. A checking 
procedure is then necessary in the design to insure that the text of an 
item is not read under the I (integer) or F (floating point) formats used 
to read an item's indexes. 

The choice of a sequential file organization and the use of Fortran 
programming language in designing a retrieval system provides a means 
for the system to be widely adaptable. Firstly, the software is a widely 
known and used programming language in retrieval systems. Secondly, 
sequential file organizations are widely known, used, and easily debugged. 
Lastly, sequential searches require minimal amounts of core storage and 


therefore allow the system to be implemented on very small installations. 
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Human User 

Any design must incorporate means to detect, diagnose, and if 
possible reduce the inevitable element of human error. This type of 
error is not localized to the intended users--educators. It is 
introduced by everyone connected with the system--writers, typists, 
coders, keypunchers, programmers, and users. Further constraints are 
imposed upon the system by educators, however, since optimal use of the 
retrieved items or data is only attained when the system accomodates the 
idiosyncracies and needs of users. 

To reduce human error, the designer must provide checks to insure 
that records containing the text of an item and its indices are in 
proper order before being stored on tape. Provision must be made for 
correcting an item's text or its indexes, whether it be due to human 
error or the accumulation of new knowledge. Whenever and wherever the 
element of human mistake may be present, the designer must try to 
provide means to circumvent and/or give appropriate diagnostic messages 
to the user so that he may rectify the problems before making subsequent 
requests. 

To increase the useability of retrievals, items should be ranked in 
order of their relevance to a user's request. However, professional 
educators rarely want a machine to choose only the exact number of items 
that will appear on an examination; therefore, a provision for giving 
an overflow of items of equal relevance is necessary in such a retrieval 
design, with checklists to facilitate decision by the user regarding 
which items are most suitable. Provision must also be made for educators 
who wish to modify their requests after primary retrievals. Since the 


usefulness of some multiple choice questions depends on proper syntax, 
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it is imperative that retrievals be in a form that could be directly 
copied onto an examination paper; the use of truncated words and omission 
of participles and conjunctions is not advisable. 

In practice the designer must be well acquainted with the discipline 
of test design and with the intended users, educators, in order to 
anticipate the constraints imposed from these sources upon the design of 


a retrieval system for multiple choice items. 


Supporting Resources 


Before a design can be implemented one must have the available 
resources, both physical and financial. In addition to the manpower 
needed in a noncomputerized file of multiple choice items, a computerized 
system requires a keypuncher, a programmer, and a system designer. 
Writers, typists, coders, and test committees are of course common to 
both the manual and machine files. 

An additional major source of expense, related to the computer 
itself, is incurred with a machine based system. Financial resources 
Must be available not only for computer cards and tape, but also for 
the running time of and the amount of core used in the computer. 

Only the last two costs--computer time and space--are directly under 

the control of the designer. By using sequential files and incor- 
porating proper job control language (such as blocking) one can keep 
costs to a minimum. Since education costs are rising the last constraint 


becomes of primary importance. 
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Summary 


The left-hand side of Figure 1 flowcharts the design steps necessary 
to handle all the constraints considered. The very nature of this design 
for storing and retrieving multiple choice items requires that the 
designer be acquainted not only with the disciplines of information and 
computing science, but also with the disciplines of education and test 
development. Without an interdisciplinary approach it would be difficult, 
if not impossible, to design a system that would not only operate, but 
also meet the needs of those who would use it. 

Discussion will now deal with the actual implementation of these 


specifications, the steps of which are outlined on the right-hand side of 


Figure l. 
Implementation of System Design 
Introduction 


In order to show the feasibility of the general design specifications 
in Figure 1, this study developed a retrieval system specifically for 
multiple choice items in medicine. This unit deals with that application. 
As each design specification is implemented, the reader should refer to 
Figure 1 in order to see the relationship between the general specifica- 
tion and this study's example of its specific implementation. The 
study incorporated a sequential file organization for storage and 
retrieval, the choice of which being determined by the following. Firstly, 
and as noted before (cf. p. 33), the IBM 360/67 at the University of 
Alberta is best suited to the handling of sequential files. Secondly, 


because the intended users wanted a thorough retrieval based on a variety 
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of keys, Salton's suggestion (cf. p. 13) that a sequential file may be 
the most practical organization seemed valid. Thirdly, cost, in terms 
of the amount of core storage, is considerably lower. The system 
analysts at this university have indicated that in the immediate future 
users will be charged for increasing amounts of core in an exponential 
manner, for example, using twice as much core may be four times as costly. 
Since random and list files involve considerable amounts of active 
memory for dictionaries and pointers, it becomes immediately questionable 
whether or not the assumed increase in economy of time would out weight 
the added cost of core. Fourthly, the amount of decreased efficiency 
in use of time for a sequential organization for this study remains 
hypothetical. Remember that Dodd (cf. p. 16) noted that random access 
is not suited to rapidly accessing a large number of retrievals, and that 
retrieved items are usually of a uniform length. Though Dodd has not 
been specific as to how many retrievals is classified as a "large number," 
and thereby estimate such an effect in this study, this investigator did 
know that stored items varied a great deal in the number of records 
composing each document. Conversely, blocking is available in the 
sequential file which, as shall be noted later in this chapter (cf. pp. 
74-76), increases the efficiency of processing such a file by factor 
approximating the blocking factor. Therefore, only further investigation 
comparing the use of sequential, random, and list files for data used 
would be able to tell which organization would be optimally efficient 
as well as inexpensive. 

Having chosen a sequential type of file organization a decision was 
made to which coding technique--manual or automatic--was to be used. To 


provide a frame of reference within which a comparison could be made of 
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these two techniques, the Department of Medicine was consulted as to the 
characteristics they would be looking for in a retrieved item. They 
expressed these four elements as most desirable: (a) area of subspeci- 
alty (such as, allergy, cardiovascular, collagen diseases, etc., (b) type 
of question (single or multiple answer), (c) taxonomic level (approxima- 
ting Bloom's (1956) taxonomy), and (d) core level (that is, whether a 
question covered essential, important, or not so important medical 
knowledge. 

Three of these specifications--subspeciality, taxonomy, and core-- 
are definitely "concepts found in multiple-word phrases" as Liptez (cf. 
p. 24) would say, and as such are most difficult, if not impossible, to 
code automatically by the use of computer. Any attempt at such automatic 
coding would require an enormous dictionary that not only included 
synonyms for words and phrases but also procedures for analyzing phrases 
and syntatic relationships. The programmer would be required to know 
the content of each document and the linguistic devices used to carry 
its meaning. It was this investigator's conclusion that the task would 
be difficult, if not impossible, to make any attempt at automatically 
indexing multiple choice questions under these codes seem futile. For 
support of this viewpoint one is referred to Kurtzke (1967) who states 
"I do not believe that standard medical records will be amenable to 
computer search and retrieval except with the imposition of a trained 
human encoder [p. 128]." 

Little contribution was gained in the area of indexing, therefore, 
from such reputable systems such as KWIC, MEDLARS, and SMART. Note, 
however, that some of the major criticisms levelled against these systems 


were directly related to the shortcomings of inadequate, automatic 


-ionyadvanio Bots (6)  Tekas 


273% (a) ,075 ,2eenecib amend 1 


oq ‘3 


——— 
\ateiis , 8s cowed xehe 
-paixozdys). Level si nonoxst (a) ¢! + eigis no elumte) aoltesup 20 

a yotxvedw. cet tot) Level eon (8) btm ebgmoaos: (Sé€L) e'moold gait : 
isnibom sastrouwk os 26n We JaetwegM .~fetsossss borsveo-nokseeup 
ee -sabsiwonk 
-+9109 BAB, YMONOXET » TY iieioagedve-<pankiteoiiioege sesit tc eet 9 = 
io) sete es “esesrcq brow-elgiti MEAS? ergqeono>" ler intteb oxs 
53 ,Shdsezogm fom tt . ot feststte teem eas: dove 25 bos ype Sivan 4S q 
ofues MHpve Js tamatt« wa «| cotduepeg Bo sey eit yd ylissitsepsus sboS 


ishulahtl yloc tod tadt yisneitehe epemmone, as etiupst bivow gaiboo 
neptiq saisVisee 29h" esulbasond, ovie tyd esesidg Sas eb70W. 102 emyoonye 
' bevivpet ed blvow somtemgetp eg?  vecidenoitele: ofssonye bas 
\ been geoiveb oiteivgatl of Dae toomuacb dose To thezne5 edt 
-o-5 oily edd ooleyhoao> eotegiszevet aiddesw 3h -gniase@ att 
ioneyue 3S fametis vas even St ¢ohdkesogmk aon Sip. aluoikibyed) 
ol“ meee eeben scot tehaw erottespp eodedo slgitium gaizebat 
=f 
seve ow (TOBE) eAstauk of berrstex 2f suo dakaqweiy eins to Jxoqque 
sfdenome “ad- Eftw ebrooay Teotivont inabasrs seit eveiled son ob I! 
benisst 6 36 noztizoqai or dtiw tesoxe Loveiaten bas donsos xosuqnen 
bese +] eboone simu 
»wisterait.(yatxebak Si ag te, ni denies 2ew ox retina = 
es . a a 
eS70K |. THAKe baa a Ae, oth 2s ee 2 tz, fous on 
eal ao 2 - 
— siadt. Fenioub b ok emetoisina soten » 


a) _ 
_ _ 7 7 


f pitihores: . S761 =. a ~~" 


4 
indexing. By choosing to use a manual indexing technique, this design 
became less convenient but, as shall be seen in Chapter IV, can provide 


more pertinent retrievals. 


Item Indexes 

Since each multiple choice item was to be manually coded it was 
necessary to provide the human encoder with a standardized format for 
the indexes which would accompany each item in the bank. The reader is 
referred to Appendix A for the listing and format of these indexes (which 
is a specific implementation of section (i) in Figure 1). The codes used 
were determined by consultation with the Department of Internal Medicine, 
the format by programming specifications. All item indexes can be 
specified on two cards, with each code being alloted to specific columns 
on one of these two cards. One of the rigidities encountered in the use 
of computers is the requirement that codes be allotted to the exact 
columns specified or else the information will be interpreted incorrectly 
by the computer. 

The following is an explanation of the indexes, that is the type of 
information desired by this particular medical department for each of 
their examination questions. As indicated before (cf. p. 25) only a 
limited number of criteria should be made in a request in order to improve 
retrieval; four criteria were used for this study. 

The first of these four was "area of subspeciality." The term sub- 
speciality implies a category of a larger speciality, and indeed that is 
what this category is. Within the speciality of Internal Medicine, 
particular areas of medicine are delimited for purposes of education, 


not unlike sections taught in a Science or Social Studies course. To be 
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42 
able to evaluate a candidate comprehensively an adequate representation, 
if not all, of these defined subspecialities were needed. Twenty-three 
areas were specified--allergy, cardiovascular, collagen disease, etc., 
through to physiology--and represented by the codes l, 2, 35 eye2sS 
respectively. This index was to be punched in an item's first parameter 
card in columns one and two. 

The seond criterion, termed "type of question,'' referred to the two 
types of questions used by this medical department--single and multiple 
answer--both of which are explained by Hubbard and Clemans (1961). The 
multiple answer question (as distinct from the single answer question 
having only one of five choices correct) has two or more alternatives 
correct. If an item being coded was a single answer type, a one was to 
be punched in column three of the first parameter card, if multiple 
answer, a two. 

The third criterion was to be punched in column four and referred 
to the "taxonomic level" of the item being coded. The taxonomy is a 
cruder classification than Bloom's (1956) hierarchy for the cognitive 
domain, a one referring to a classification termed factual, a two for 
comprehension, and a three for problem solving. The first two were 
analogous to the first two levels of Bloom's hierarchy--knowledge and 
comprehension; the last, problem solving, encompassed the remaining 
hierarchy as suggested by Bloom with the exception that little or no 


attempt was made in composing multiple choice questions at the level 


of synthesis. 
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The fourth and final criterion was for classifying the importance of 
the subject matter tested in a given item. Called "core level" a one, 
two, and three refer to essential, more important than unimportant, 
and more unimportant than important material respectively. It was felt 
that this category was desirable in evaluating the comprehensiveness 
of a particular group's or candidates's knowledge. 

There are other indexes accompanying each item and the reader may 
refer to Appendix A to note their characteristics, codes, columns and 
parameter card number in which the codes were punched, and the abbrevi- 
ations which the computer used in designating a particular item's codes. 

In summary these are the remaining indexes which, although not 
used in retrieval, provided the user with further useful information, and 
presumably aided him in modifying his requests for iterative searches and 
his own final selection. (a) "Second area of subspeciality" allowed an 
item's inclusion in more than one subspeciality category if necessary. 
(b) "Source" indicated the institute or country from which the item was 
obtained. (c) "Province": if the item was authored in Canada (number 
three in "source'"), this index was to indicate the medical institution 
from which the item was obtained. (d) "Audio-Visual" code was an 
optional code for indicating that the item had additional material to 
that of the written text of a question. Since no hardware facilities 
exist for storing line graphs, photographs, color sequences, slides, 
movies, or video material on the 360/67, items making use of such material 
had to be identified. (e) "Audio-Visual I.D. location" would have been 
specified also if the previous code, "Audio-Visual," had been specified. 
This identification number would indicate to the user where this extra 


material was located in the audio-visual file. At present Internal 
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Medicine is not using questions incorporating audio-visual material. 

This provision was incorporated in anticipation of their future use. 

(£) In the appropriate column(s) 15-19 a one was to be punched indicating 
the "correct response alternative(s). For example, a single answer 
question with choice three correct would have a one punched in colum dye 
for a multiple answer question with choices one, two and five correct, 

a one would be punched in columns 15, 16, and 19 respectively. (g) 
"Language" indicates whether or not the item is available in English, 
French, or both languages. Since the examinations set for entrance to 
the Royal College of Physicians and Surgeons can be taken in both 
languages, the code is a meaningful entry. (h) "Number of times used" 
specified the number of times this particular item had been used for the 
Royal College's examination. If this has been left blank no further 
entries were required in the remaining columns of parameter cards one or 
two except for the speciality, the card and the item identification 
numbers. These numbers must be specified on both parameter cards, as 
will be explained later. (i) "Last year question used" required only 
theslast twoidigits of the year. (j) The mextsfour codes referned to 
the last examination in which a particular item was used--"Number of 
question on last exam," "Graduate or undergraduate exam," "National or 
local exam," "ID of exam," "Number of examinees on last exam." Though 
there was no storage of questions for local or undergraduate examinations 
a provision was made to accomodate any future moves to use an information 
and retrieval system at these levels. (k) In columns 35-36 and 37-38 
the "'p' values" or difficulty levels of a single-answer item were to be 
specified. Provision was made for only the last two testings of this 


item, as the examiners felt that no item would be used, or at least left 
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unmodified, after three testings. (1) Columns 39-40 and 41-42 allowed 
specification of the biserial coefficients for the same testings of an 
item. (m) Columns 35-42 were of course not applicable for multiple- 
answer items and would be left blank for multiple-answer questions. 
Provisions for item analysis data concerning multiple-answer type of 
questions involve more detail. Since each choice received a mark (a 
candidate marked it as correct or incorrect and received a mark for the 
correct identification), each had a difficulty level and biserial coeffi- 
cient. Added to this was total difficulty of the item (average of the 
difficulty levels of each of the five choices) and the biserial coefficient 
of correlation for the total item. Columns 43-66 of parameter card one 
allowed for this data for the last recorded testing year, and columns 
1-24 of parameter card two for the second last recorded testing year. 
(n) In columns 25-34 of parameter card two, specifications were made for 
MProportacon Of last cest selecting ~ 2%" Choice one to five. (o)i finally 
in columns 74-75 provision was made for the use this coding technique by 
other specialties. For Internal Medicine the code "01" was used. Any 
other specialities joining would receive another distinctive number. 
Column 76 indicated the parameter card number, one and two for the first 
and second parameter cards respectively. Columns 77-80 were provided 
for the identification number of the item, each item in the blank 
receiving a unique value up to a maximum of 9999. 

In concluding this section on item indexes, the reader will note 
that the codes referred to the specific needs expressed by the Department 
of Internal Medicines. The numbers (or codes) themselves, however, are 


not inherently restricted to a particular subject matter field. They 
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are simply a means for reducing information as known to the human into 


a more concise, usable form for the information machine. 


Programming for Retrieval: MEDSIRCH-1 

To provide an iterative approach to retrieval two programs were 
written in Fortran IV programming language and termed MEDSIRCH-1 and 
MEDSIRCH-2. The first of these programs provided the initial search, 
the second being used for the modified requests. In the following 
explanation of these programs the reader will see how programming effort 
can be reduced by attempting to use many of the same features in both 
programs. 

Appendix B provides the compilation listing of this program. 
Provided stored items and indexes and user requests conform to format 
specifications, this listing of MEDSIRCH-1 provides the Department of 
Internal Medicine with a debugged, effective, functional program for 
retrieving multiple choice items. 

Preparation of items for storage. To implement section (ii) of 
Figure 1, this study required that all items be punched according to 
rigid format specifications. Specifically the text of an item had to be 
restricted to columns 2-69 of each punched card. Since the total item 
often consists of many lines, provision was made for indicating the item 
identification number on all cards. This number was punched in columns 
74-77, The last column (80) of each card containing a given item needed 
one of three indexes--"'C", "*," or a blank. These codes have the 
following functions. A "*' indicates that when an item is retrieved the 


printer should skip a line before printing the text stored on the next 
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card. A "C" causes the printer to write on the next line. This spacing 
provides an aesthetic quality to the output and presents the text in a 
manner similar to that used in an examination. If column 80 has a blank, 
this indicates that it is the last card containing the text of a given 
item. The program then instructs the computer to switch from the format 
for reading alphanumeric text (A format) to a format for reading the two 
parameter cards containing an item's indexes (Appendix A). 

Having read an item and its two parameter cards the program then 
expects the appearance of another item with its parameter cards. There- 
fore, in setting up the bank one must punch an item, its parameter cards, 
the next item, its parameter cards, and so on to the end of the bank. 

The end of a bank is designated by an "E" in column 80. MEDSIRCH-1 
then instructs the computer to stop reading the bank. 

The bank of items for this study was stacked on tape (section (iv) 
of Figure 1) with the utility program provided by the IBM 360/67 system. 
The system cards for calling this facility are given in Appendix C. 

This type of data set is read in the same manner as cards, but can 
contain a greater number of records than the limit of 2000 records 
imposed on cards. Though 2000 cards is an arbitrary number decided upon 
by the operators of the IBM 360/67 at this university, one would find it 
most difficult to manually use any more records than this in any case. 

Before cards were stacked onto tape this investigator found it 
advantageous to use the program CHECK (Appendix D) to insure that items 
and their respective parameter cards were properly collated (section (iii) 
of Figure 1). This was necessary for a number of reasons. (a) Firstly, 
and most importantly, if any card (other than the last card) containing 


the text of the item had a blank in column 80 the computer ceased 


aiveds 
n ted? of ~ktadeeeaine 
“yh ced7 eesepibak whit 


owe oy BnrbSsSes "102 +sar0% sor @ 
CA xibneqgA) eexsbar 
aad? meqgota sit ebisp cebu: ae beet gaivel® ~~ 
-sqei? .2bve9 it9tsmeteq atk Ti RS eSdtone Yo sonets]dqs odd etosqxe 
abis> Asvomeisq ari (aeti mB donuq taut eA dasd oft qu golstees Gi”, etot 
Aged sity to brs sit of po ce BRS sine aestomsisq etl ,wotl tXem Sit 
SO8T2CIM .08 seutes of "2" #6 Qa BSRRBRER ef clned & to bao sdT 
Aged edt gokisat gode"6P terwqmos sit etourseak edd 
(vi) goitose) sas? fo beloste Sew yhete akdy tot emeti to Ausd edT 
mateve TA\Gae Mel ois yd t Sanaa mezgotq yIiiitu ed? dtiw (i ewsgit Yo 
tbnoqqé ai novin ste ythiios® efit gaiifso sot ebiso meteye oaT 
mes tod ,2bis> en sentem ouse ae ai ber ai toe sinh to sqys efdT 
ebroom: O00 te timil ody asd? Shigser to asdaun tsetse 8 aiesnos 
coqu bebiash  técauq yatitidys a5 2! abies: 0090f dtgeoit  sebusd no bezoqnt 
tL bait Bluow ono ,viietsvinu aldt te Va\Oee MET edt to erotereqd oft yd 
.se85 yas ol att osit sibrose: stom yas sew Yfiaeasm OF tiuolTSib seem © 
Ji Bavek sotsyiteovnt eid: syst otno bedsste stow ebiso eXoled 
oneal zent event or/(G xkneqqh) XIGN0 mesgexg 489 Bab OR ate ide ten 
(lik) aoitesa) baisbten ultbqorg ‘|taw eh469 teTomnetEg ovivoegnen ters bas 
eVitesk? is) + 2008651 to sedsaua 6 not yWiER2en0R coe sat SL oi 38 
“gitinistacs (bts taal ont asd? Yerito) bees yam TE Yl 


J) 


' estes somuqeos sft 05 amalos mE olasid's best meds 


ey a 


48 


executing MEDSIRCH-1 after reading sucha card. This was due to the 

fact that a blank index in column 80 caused the computer to expect a 
parameter card to be read under integer and floating point format. Since 
the next card was still in alphanumeric format, a syntax error is regis- 
tered and execution is arrested after printout of the message IHC215I-- 
illegal decimal character. (b) Secondly, the program CHECK indicated 

a number of errors that the author or the key puncher may have made 

in collating the items and their parameter cards. Table 1 provides a 
list of possible mistakes the user may make and the diagnostic messages 
the program CHECK gives for each one. 

The explanation of the CHECK program is included in this subsection 
since it is also used as the PDISC subroutine of the program MEDSIRCH-1. 
PDISC is called from the mainline program in order to write the entire 
stacked tape onto a more efficient medium, that of disk (section (v) of 
Figure 1). This procedure is done for two reasons. (a) Disk as opposed 
to tape, is a direct access medium and as such requires no rewind time. 
Instructions to REWIND this type of data set involve the simple movement 
of a READ/WRITE head (a hardware device) from its present location to 
the beginning of a file. Conversely, a REWIND instruction to tape 
literally involves the rewinding of a magnetic tape, not unlike the 
procedure used on tape recorders. The use of disk, therefore, involves 
great savings in time, especially if REWIND instructions are used as 
much as is done in MEDSIRCH-l. (b) Writing on disk is done without 
format and as a result the information is in a form that is immediately 


processable by the computer. 
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Preparation of search requests. The use of MEDSIRCH-1 requires 


minimal effort in the preparation of requests. The user is required to 
specify in fields of five the subspeciality, type of question, taxonomic 
level, and core level codes as criteria to be met by items retrieved. 

For the fifth code the user must also specify the minimal number of items 
required; the sixth and last specification--hierarchical level--indicates 
the level of pertinency to which retrieval may procede if the minimal 
number requested is not met. The six entries constitute a request. 

Though a user may make as many requests as he desires, each one is punched 
on a separate card. To indicate the end of his requests the user inserts 
a blank card behind the last request card. 

Figure two shows a set of requests with the concluding card blank. 
The first card requests a retrieval to the last hierarchical level for 
a minimum of six items that meet these restrictions: (a) subspeciality: 
one (allergy), (b) type of question: two (multiple answer), (c) 
taxonomic level: one (factual), and (d) core level: one (essential 
information). After the program searches for such items in the bank, it 
then will search for items meeting the specifications on the second card 
and third card respectively. The blank card will properly stop the 
execution of this particular set of requests. 

Use of DATA statements. The mainline of MEDSIRCH-1 and its sub- 
routine PARMTR make valuable use of DATA statements in converting the 
numerical codes of the parameter cards (Appendix A) and request cards 
into verbal text that is written in the output. This verbal text is 
designated in Appendix A in parentheses to the left of each index. For 
example, a "1" for subspecialty will be printed in the output as ALL, 


referring to allergy. 
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Use of DATA statments is also made for converting the codes placed 
in the 80th column of each card of the text of an item CCG Mee aieaie 
blank) into useable variable syntax as these alphanumeric characters are 
meaningless in themselves. 

Search technique. Execution in MEDSIRCH-1 begins with the mainline 
calling the subrouting PDISC where the tape bank (data set eight) is 
subjected to the analyses of the program CHECK. The bank is then put on 
the direct access medium of disk without format and is thereafter referred 
to as data set one (see section (v) of Figure 1). After the entire bank 
has been transferred to this medium in PDISC, a return is made to the 
Mainline where data set one is rewound in order to begin searching for 
all items pertinent to a user's request. At this point in execution 
MEDSIRCH-1 is at section (i) of Figure 3. This figure flowcharts in 
detail the search strategy used in this program. Search of the entire 
bank on disk is made to see how many, if any, of the four criteria 
specified in the request are met by each item. If a particular item 
does not meet any of the criteria it is ignored. If an item meets any 
criteria it and its parameter cards may be rewritten as other data sets, 
the location of which is determined by the following strategy for 
hierarchical output of retrieval items. 

All items meeting the four criteria are written on data set six, 
that is, are printed in the output with a preceeding title indicating 
that the items meet all four criteria specified by the user; see section 
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From PDISC 
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Those items meeting only the first three restrictions may be written 
onto another disk and referred to as data set two (section (v)). A 
similar procedure is carried out for those documents meeting the first 
two and one restriction where each set may be written on other disks and 
referred to as data set three and four respectively (sections (vii) and 
(ix)). Counters are kept for the number of items in each data set. 

After the original bank (data set one) has been completely searched 
(section (ii)), a check is made as to the number of items that were 
printed (section (x)). If this number of pertinent retrievals is equal 
to or greater than the minimal number that the user requested, this parti- 
cular search is terminated and a new set of requests is read. If this 
number of pertinent retrievals is less than the minimal number that the 
user requested then those items in data set two (those meeting only the 
first three restrictions) may be read and printed in the subroutine SORT 
(section (xii)). If the minimal number is still not met, or is exceeded, 
those items meeting the first two restrictions and, if necessary, one 
restriction, may be retrieved from the appropriate data sets (sections 
(xiii-xviii)). It is possible that a bank with too few items ina 
subspecialty may not provide the user with enough documents even at 
the last hierarchical level. 

Such a strategy implicitly assumes that the four criteria are ranked 
relative to their importance. Thus, a user who is not able to receive 
enough items meeting all four criteria considers that the next most 
pertinent retrievals are those meeting only the first three criteria, 
not any three criteria. The analogous situation is true at lower 
hierarchical levels of pertinency. The choice of this strategy was 


based on the needs expressed by the intended users of this program, 
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the»Depertment of internal Medicine.”* Since the® degree of Syboese! for any 
information system is improved by the extent to which the system accomo- 
dates the idiosyncracies of users, the choice of this search strategy for 
Medicine is valid. Similar needs exist in other educational fields as 
well. At examination time school teachers want items first and formost 
for a particular grade, then for a course (e.g., Social Studies), then 
for a particular section (e.g., geography), and at least within a 
particular range of difficulty levels and/or biserial coefficients. If 
items were retrieved in subsequent hierarchial levels on the basis of 
meeting any three, two, or one restriction, some retrieved items would 
be irrelevant. For example, consider a request for items in grade eight 
Social Studies, on geography, with difficulty levels within the range 
.2-.8. If items were retrieved at hierarchical levels on the basis of 
meeting a particular number of criteria , regardless of their relative 
rank order, retrievals for this request might be items with correct 
difficulty levels, in geography, but for the wrong grade. Presumably, 
therefore, the retrieval strategy is valid for most needs where order of 
restrictions is ranked. 

It was discovered after the first retrieval that without the last 
specification--hierarchical levels--retrievals were often redundant. 
For example, suppose a request was made for six items with indexes 
FAR a Roa ha coat wien the bank could provide only three such items. The 
program would then automatically retrieve items with 20-1-1-2 and 20-1-1-3 
in order to build up the retrieval number to six. The user may have 
specified in another request, however, a set of criteria exactly like 


those items retrieved at the second hierarchical level, for instance, 
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20-l=1-3, All items wath these criteria as characteristics therefore 
would be retrieved twice. 

In order to reduce this redundancy, the user's request was modified 
to include one more variable, the hierarchical level to which he wished 
a retrieval to be made. In this way he could determine the number of 
levels of hierarchy to which retrieval can take place, regardless of 
whether or not the number of items he hoped to receive was attained. By 
specifying "4" (section (iv) of Figure 3) he instructs the computer to 
retrieve only items meeting four restrictions (section (xi)); a "3" 
(section (vi)) indicates that retrieval cannot extend into any lower 
hierarchy than that in which items meeting only the first three restric- 
tions are found (section (xiv)). Similar procedures apply for the 
specification .of 12" (sections (viii) and (xvii)); a "1" would allow 
retrieval to take place to all four levels if necessary (section (ix) 
and (xviii). 

In summary this hierarchical specification may override the speci- 
fication for "number of items desired.'' If, however, the user allows 
retrieval to proceed to a lower hierarchy than is needed, then the 
specification "number of items desired" terminates retrieval. 

Increasing useability of output. An effort was made to increase the 
ease of reading retrievals and, hopefully, aiding the user in finding the 
most relevant retrievals. 

Since retrievals may extend to varying hierarchical levels sub- 
routine SORT instructs the printer to write appropriate messages to the 
user indicating how many restrictions are being met and whether or not 


a given level of the hierarchy is able to fulfill the user's request. 
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Table 2 gives a list of all the possible messages. Items having the 
characteristics designated by these messages follow immediately after in 
the printout. In order to keep track of the number of documents being 
printed out for a particular request, a cumulative number is printed 
beside the printed text of the item as well, and carries through all 
hierarchical levels. As a means of further identification the bank 
identification number of each item is also printed. 

Since not all items have the same degree of relevance (even within 
a given hierarchical level) provision has been made so that each item 
and its parameter information is printed on a different page. If the 
user so wishes, he may easily tear out the most pertinent items since 
each computer page is perforated. 

A listing for each request is printed as an indication to the reader 
as to which of his many requests a particular retrieval applies. A 
sample heading is provided in Table 3 for the request 21-2-1-2-1-4. 

Though a search is performed on the basis of a document meeting 
only four of the indexes, any retrieval does include a readable printout 
of the remaining indexes where the human coder has been able to provide 
information. The subroutine PARMTR performs the conversion of indexes 
to a mode that is interpretable by the user. Included in Table 3 is 
an example of such a listing entitled PARAMETERS FOR THIS ITEM. 

The subroutine PARMTR prints a check list for a person called the 
REVIEWER. The Department of Medicine makes use of such a person before 
a group, called the "test committee," finally selects which retrieved 
items will actually be used in an examination. This reviewer, in essence, 
performs an "updating" function, whereby any faults (grammar, content, 


etc.) found in an item may be noted before the final copy of the item 
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reaches a test committee member (see Figure 1, section (vi)). This does 
not imply that the reviewer's decision is final, however, 

As the reader will see in Table 3, a final check list is provided 
for the test committee as well. (This listing is performed in the 
subroutine TEXT.) In this check list, the test committee members may 
indicate whether or not they will accept the modifications suggested by 
the reviewer. Provision is also made for the committee member to mark 
the relative usefulness of an acceptable item, the date, and his initials. 
The check list for both the reviewer and test committee not only reduces 
the clerical effort required of users after a retrieval, but also 
enhances the usefulness of the output. 

There can be little doubt that these small, often quite trivial 
aids, do as much to improve the usefulness of a retrieval as a more 


fundamental element--creative search strategy. As such they are not to 


be lightly regarded in designing an information storage retrieval system. 


Programming for Iterative Requests: MEDSIRCH-2 

As noted before (cf. p. 26) provision for iterative requests has 
been found as an attractive means of improving the user's satisfaction with 
his retrievals. The pilot run of MEDSIRCH-1 (which was to be the retrieval 
of a satisfactory number of multiple choice items for presentation to 
Internal Medicine's test committee setting the 1969 examination for the 
Royal College) indicated that provision for iterative requests was 
necessary in this design also. 

Due to the fact that all items are retrieved within a given 
hierarchy it is possible, for example, that 10 items would be retrieved 


for a given set of criteria, even though the user had requested only five. 
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The presence of redundancy has already been discussed (cf, pp. 57-58). 
In the pilot run the two factors caused MEDSIRCH-1 to retrieve a great 
Many more items than could feasibly be sent to the test committee for 
consideration. To circumvent this problem the reviewer for Internal 
Medicine noted the items that were considered most relevant and made a 
request for those. MEDSIRCH-2 was the program used to search for these 
modified requests. 

To save programming effort the techniques used in coding, file 
searching, use of direct access media, and the format of input and 
output used in MEDSIRCH-1 were incorporated into MEDSIRCH-2 (section (v) 
of Figure 1). As a result the subroutines PDISC, PARMTR, and TEXT are 
identical in both programs. The subroutine SORT was eliminated, however, 
since the search was now for specific items, not for characteristics of 
items with the result that there was no need for retrievals at differing 
hierarchical levels. 

Preparation of search requests. Instead of preparing one card per 
request aS in MEDSIRCH-1 the user had now to prepare two cards per 
request. The first card, however, was much the same as before. Six 
specifications are inserted onto this first card: (a) an identification 
number of the request, (b) the four criteria as before, and Cevehe 
number of items required. Hierarchical level is no longer needed and 
is therefore omitted. 

The second card has its specifications punched as: (a) an identi-. 
fication number of the request, matching the number of the first card; 
and (b) the bank identification numbers of the items desired. If the 


user is requesting more than fifteen items in this particular request he 
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inserts a third crd (or as many as he needs) containing the remaining 
item identification numbers starting in the first field of five on every 
such card. 

These pairs or groups of cards must be prepared for every set of 
items meeting the same criteria. A blank card after the last request 
properly terminates the execution of MEDSIRCH-2. Two requests followed 
by a blank card are illustrated in Figure 4. 

Search technique. The mainline program of MEDSIRCH-2 transfers the 
tape bank onto disk in the same manner as before. It then reads one group 
of cards constituting a request. If the identification numbers of this 
request do no match for these two cards, a diagnostic message indicating 
the mismatch is written for the user. For example, if a pair had 
identification numbers of 1 and 10 this message would be given: 
MISMATCHED ID-PAIRS ON THESE PARAMETER CARDS: 1 AND 10. 

The first card of a request is used as information for writing out 
the chart RESTRICTIONS IMPOSED (Table 3). It also uses the specification 
"number of items requested" as a means to determine how many item identi- 
fication numbers to read in the next card(s) of a request. 

Storing these item identification numbers in core that have been 
read on the remaining card(s), the computer then searches through the 
bank retrieving the items with such identifications. When all items 
have been found, search is arrested, disk is rewound and another request 
is processed in the same manner. 

From a user's point of view he will see no difference between the 
output of MEDSIRCH-2 and that of MEDSIRCH-l. The format of the output, 


the parameter information, checklists, counter, and so forth are iden- 
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tical to MEDSIRCH-1. He will, however, receive only the first message 
in Table 2 since he has specified in his pair of request cards that all 


items retrieved belong to one set of criteria. 


Programming for Updating Tape Bank: UPDATE 


The original bank of items for any information storage and retrieval 
system must be modified as new material is acquired (section (viii) of 
Figure 1). In this study, multiple choice items retrieved and used acquire 
new parameter information, such as revised biserial coefficients, diffi- 
culty levels, and so forth. Some of these items may be found to be 
inadequate and necessitate modifying the text of an item. Other documents 
may have to be thrown out because advance in medical knowledge has 
shown the item to be irrelevant. Conversely, new medical knowledge 
dictates the need for the inclusion of new items. The program UPDATE 
(Appendix F) was written in order to accomodate all these possibilities. 

As with MEDSIRCH-2 attempts were made to reduce programming effort 
in UPDATE by using some of the techniques of the original program-- 
MEDSIRCH-1. 

Preparation of user's requests. Of the five programs used in this 
study--CHECK, MEDSIRCH-1, MEDSIRCH-2, UTILITY, and UPDATE--the latter 
requires the most effort on the part of the user, as he must prepare the 
following cards. 

Title card: this first card allows any alphanumeric characters in 
columns 1-80; whatever the user specifies will be used as a title in 
his YoutpuG. 

Parameter card: on this card, in fields of five, the user is 


required to give these specifications--(a) the number of items he wishes 
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to be deleted from the bank, including those items he is modifying; that 
is to say, a user must resubmit a whole new item, including its two 
index cards, if he wishes to make any modification to the text of an 
item; (b) the number of items having their index cards modified; and 

(c) the number of items being added as new or modified items. 

ID deletion card(s): if the user has specified in the above 
parameter card that items are to be deleted, he must also submit these 
"ID deletion card(s)." The user must punch the identification numbers 
of all items he wishes to be deleted from the bank. As already noted 
this must include those items that will be modified since the modified 
items are treated as new additions. Only 16 identification numbers can 
be punched on a card; if more cards are required numbers are punched on 
those cards in the same format (fields of five). UPDATE imposes no 
restrictions on the number of deletions a user wishes to submit. 

Item index cards: to modify an item's two index cards in store the 
user must prepare these two index cards based on the codes in Appendix A. 
Up to .00o0f these pairs of index cards can be submitted in one run of 
UPDATE. 

New or modified items: these are punched in the same manner as 
before (cf. pp. 46-48). The user must also include an item's two index 
cardsin this submission. UPDATE imposes no limit on the number of new 
or modified items being added to the bank; the only restriction one must 


consider is the amount of space remaining in the tape bank. 
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Updating technique 

The procedures available for updating a blocked, sequentially 
searched data file is limited (cf. p. 14). Therefore, UPDATE uses the 
technique of rewriting the bank on a new data set--another tape. In 
practice this is the most advantageous technique, since one has the 
opportunity of checking the validity of the new tape before destroying 
the old. This is not possible when revisions are made on the same tape. 

While holding in core the identification numbers of all items to be 
removed, as well as all index cards being inserted as modifications, the 
mainline program of UPDATE reads through the old tape. Where no changes 
are made, the old tape is simply copied onto the new tape. If the 
identification number of an item on the old tape matches any of the iden- 
tification number of items designated for deletion that item is not 
rewritten. Similarly if the identification number of an item in the old 
tape matches that of index cards being used for modification, the old 
index cards are not written; instead the modified pair of index cards is 
written. After reaching the end of the old tape, cards containing 
the items to be added, whether they are new or modified, are read and 
subjected to the analysis used in the program CHECK (section (vii) of 
Figure 1). If there are no mistakes they are written on to the new tape. 
After the final item has been added, the program writes a record with an 


"NE" in column 80 to indicate the end of this new tape. 
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Diagnostic messages 
In order to aid the user in the proper modification of his new bank, 
UPDATE provides diagnostic messages. A counter is provided at each 
point where a mistake can be made. If, at the end of the updating, any 
of these mistakes have been made, a message is given to indicate this: 
*S*NOTE 
THIS ATTEMPT TO UPDATE DATA FILE HAS NOT BEEN PROPERLY DONE. 
THERE ARE [2] MISTAKES MADE. 


REGARD ABOVE MESSAGES TO CORRECT AND RUN THIS PROGRAM AGAIN. 


The ABOVE MESSAGES refer to the same diagnostic messages given in 
Table 1. This reduction in programming effort is made possible since 
the basic program CHECK is used in UPDATE. There is one contingency, 
however, that is unique to this program. UPDATE must not only diagnose 
misplaced parameter cards which accompany a new or modified item (Table 
1), but must also diagnose misplaced parameter cards being inserted as 
modification in themselves. The following message is given for mistakes 
made in this latter case. 
#ENOTE 

THESE PARAMETER CARDS WHICH ARE BEING USED AS MODIFICATIONS ARE NOT IN ORDE: 

FIRST PARAMETER CARD READS: CARD [2] ITEM NO. 143 

SECOND PARAMETER CARD READS: CARD [1] ITEM NO. 1430 

If the user has made no mistakes in using the UPDATE program, messages 

are given to indicate the satisfactory revisions. Table 4 provides a 
sample output. The reader will note that in this example items l, 8, 
10, 20, and 25 are modified. As such messages are given to indicate that 


the old, unmodified items (bearing these identification numbers) have been 
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TABLE 4 


TEST RUN FOR NEW PROGRAM UPDATE 


THESE ITEMS HAVE BEEN REMOVED : 1 
8 

10 

20 

23 

31 

34 

38 

40 

42 


PARAMETER CARDS OF THESE ITEMS ( INDICATED BY THEIR ID NUMBERS ) 


HAVE BEEN CHANGED: 2 

3 

m 

5 

6 
ITEM i HAS BEEN ADDED AS A NEW OR MODIFIED ITEM. 
LTEM 8 HAS BEEN ADDED AS A NEW OR MODIFIED ITEM. 
ITEM LO HAS BEEN ADDED AS A NEW OR MODIFIED ITEM. 
ITEM 20 HAS BEEN ADDED AS A NEW OR MODIFIED ITEM. 
ITEM 2S HAS BEEN ADDED AS A NEW OR MODIFIED ITEM. 

ake NOTE: 


TO BE SURE TAPE HAS BEEN PROPERLY UPDATED RUN MEDSIRCH2 ASKING FOR 
ITEMS THAT WERE MODIFIED OR ADDED. 
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removed, and the modified items (with the same identification) have been 
added to the new tape. As a final precaution to the user, he is requested 
to run the MEDSIRCH-2 program asking for the specific items he has 
modified in order to verify the updating of his new tape bank (section 


(vieot Figureligs 


Programming for Efficiency: Job Control Language 

The time taken to do a sequential search can be improved by the 
use of appropriate job control language on systems cards. More detailed 
descriptions are available elsewhere (IBM, 1967; Hazlett, 1969); however, 
the IBM 360/67 system's parameters and subparameters specific to this 
study are now described. 

Data Control Block (DCB). This parameter was used for specifying 
record formats (RECFM), blocksizes (BLKSIZE), logical record lengths 
(LRECL), and densities (DEN). Blocking tends to increase the reading 
efficiency of one's data set by approximately the order of the blocking 
factor.© That vis to say, if one has a blocking’ factor of 10; then the 
efficiency is approximately 10 times as great as a data set with no 
blocking. There is a point, however, at which efficiency begins to 
drop off; this is determined by the total core storage being used, and 
is peculiar to each program. Blocking was used in this design for both 
the tapes and for the direct access medium of disks. 

Both the programs UTILITY and UPDATE (sections (iv) and (viii) of 
Figure 1) essentially write all records onto tape in card image. As 


such, the records are fixed length and are under format control. Card 


five in the utility program (Appendix C) specifies the exact DCB parameter 
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for this study. The explanation of each of the subparameters is as 
follows. 

Since the tape is stacked in card image, RECFM must be fixed blocked 
(FB) since records are indeed blocked and are the same fixed length as 
a card (80 columns). This in turn determines that the maximum READ / 

WRITE statement used for the tape must be no longer than 80 columns; 
thus LRECL is 80. While tapes may have any blocksize, efficiency was 
not noticeably increased after a blocking factor of 90; therefore, 
blocksize was set at 7200 bytes. Since all tapes used on the IBM 360/67 
should be nine-track the only allowable density is 2. 

Before any searches are begun, the entire bank is rewritten onto disk 
without format in order to increase the economy of processing searches. 
The lack of format control presents some difficulty, however, in efficient 
blocking. Whereas writing with format control LRECL was found by counting 
the maximum number of characters (columns) in a READ/WRITE statement, 
reading or writing without format control LRECL is the maximum number of 
variables multiplied by four (the number of bytes). In the text of an 
item, a record without format control contains 19 x 4 = 76 bytes. The 
first parameter card contains 40 x 4 = 160 bytes and the second parameter 
cards have 19 x 4 = 76 bytes. Thus the records which were originally of 
fixed length on tape are now, on disk, of variable length. (Since this 
is generally the case, only variable length records may be written 
without format control.) 

Since LRECL is four more bytes than the maximum record length one 
would ordinarily specify this as 164. With an optimal blocking factor 
of 90 the BLKSIZE would be (90 x 164) + 4 = 14964. (For further explan- 


ations regarding calculating procedures see the literature (IBM, 1967, 
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pp. 44-46; Hazlett yel3eo,(pp., 15-19). This procedure is unacceptable 
since maximum BLKSIZE for disk is 7294 bytes. 

To circumvent this difficulty all programs--MEDSIRCH-1, MEDSIRCH-2, 
and UPDATE--write the parameter information accompanying each item as 
three, not two, records. The first and second record are 19 variables 
long, hence 76 bytes; the third has 17 variables, therefore, 17 x 4 = 68 
bytes. 

Since the maximum READ/WRITE statement without format control is 
now 76, LRECL is 80. Using a blocking factor of 90 one obtains a 
BLKSIZE of 7204, which is under the maximum size of 7294. Thus one has 
kept the optimal blocking factor of 90 while still keeping within the 
limitation of blocksize for disk, 

Separation Subparameter (SEP). One further feature was incorporated 
to optimize the efficiency of the sequentially searched file in MEDSIRCH-1, 
that of providing separate access arms to each of the data sets on disk. 
The reader will recall that four different data sets on disk were used: 
(a) data set one for the main file, and (b) data sets two, three, and 
four containing items retrieved at various hierarchical levels. The 
access arm containing the READ/WRITE head in the computer must move 
through a physical distance when reading or writing on different disks. 
The SEP subparameter decreases access time to data sets by providing 
separate access arms to each data set. Though this separation device 
may be ignored by the operating system if an insufficient number of 
access-arms are available this feature does optimize execution time 


whenever honored. 
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Space parameter (SPACE). In order to insure that adequate space 


was allotted to the running of these programs, a space parameter was 
also included. All contingencies encountered could be handled in the 
specification SPACE=(TRK,(10,10). In other words space was reserved 
for 10 tracks, with option of allowing up to 10 more tracks when 
previously allocated space was exhausted. 

Table 5 contains a listing of systems cards used for the two 
MEDSIRCH programs as well as UPDATE. The other specifications in this 
Table that have not been dealt with in detail here are required when 
using any tapes or disks on the IBM 360/67 and as such need no specific 
description. 

In this chapter the reader has been given the reasons for the choice 
of this particular design in information retrieval, the techniques used 
in indexing and searching, the preparation user requests and provision 
of iterative searches, means for reducing user errors and provision for 
diagnostic messages, and finally suggestions for optimizing the retrieval 
efficiency in terms of time and programming effort. At no point has 
there been a critical evaluation of this design. Since no design can be 
accepted without appraisal one must look at this design's relative worth 
as an information storage and retrieval system. Chapter IV deals with 


this evaluation and the implications stemming from such an appraisal. 
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CHAPTER FOUR 


SUMMARY, EVALUATION, AND RECOMMENDATIONS 


Summary 


In order to focus one's attention to those areas of this study 
that must be evaluated, a restatement of the important features of this 
design follows. The purpose was to design and implement an efficient, 
broadly useful technique whereby educators could reduce the clerical 
work involved in keeping and selecting multiple choice items for examin- 
ation purposes. Of the three basic data files--sequential, random, and 
list--the first was chosen in order to best meet the needs and demands 
of users and of the capabilities of available computer hardware. Manual 
indexing was also chosen as the most promising means of retrieving items, 
even though the process involved inefficiency and the cost of human 
effort. 

A detailed description was given as to the nature of the indexes 
and items, the format for their specification, and means for storing 
them as a bank on magnetic tape, to be searched with the program 
MEDSIRCH-1. The search technique, the weighting of retrievals, and 
nature of output was also explained. Im order to allow a user to 
receive the most pertinent records, MEDSIRCH-2 was provided for modified 
requests. 

Since no design in the field of information is complete without 
provisions for modifying the store of information, the program UPDATE 
was intended to serve this function, providing means for additions, 


deletions, and modifications. 
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Throughout the entire design, steps were taken to reduce human 
effort: (a) user's requests required minimal specifications in the 
most simple of formats; (b) diagnostic messages were provided in the 
output, indicating as clearly as possible the exact type of error made 
by a user; (c) retrievals were titled in such a manner as to indicate 
their degree of relevance, other aids such as check lists and perforated 
pages increasing the utility of retrieval, and (d) programming overlap, 
While accomplishing different tasks, reduced programming effort and cost. 

Furthermore, the attempts made to anticipate future needs, such as 
indexes for audio-visual material, graduate or undergraduate, local or 
national and speciality number will, presumably, reduce future modifica- 
tions to the indexing code of Appendix A. 

Last but not least, job control language was found to be a means 
for reducing execution time of the programs used in this design. 
Parameters and subparameters, such as data control blocks, record 
formats, blocksizes, separation devices, all contributed to increased 
efficiency. Their exact specification, however, was determined by 
the use of another means in attaining efficiency, that of direct access 
disk. Disks not only allowed the direct transfer of binary information 
but reduced the time for REWIND statements. 

This study has provided no evidence for estimating the cost, the 
efficiency or the degree of recall and relevance and its generality 
or contribution to the field of education and information science. 


This, of course, is of importance and will now be discussed. 
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Evaluation 


The evaluation of this design will be carried out under three main 
headings: (a) usefulness, (b) cost, (c) generality. Unfortunately a 
rigorous evaluation of such a design is not possible unless other systems 
can be used to estimate its relative worth. This study has made no 
attempt to implement other designs for this purpose. In many instances, 
therefore, evaluative claims are more subjective than objective, and as 
such must be considered in that light. The literature (Jahoda, 1968; 
Spring, 1967; Vickery, 1965) indicates that most analyses of information 
search procedures, and measurements of retrieval performances, are saddled 
with similar difficulties. This report, however, will give recall, 
relevance and cost figures in order that further studies may use this 


design for comparison. 


Usefulness 

Ernst (1965) provides two mathematical models with which to eval- 
uate the usefulness and pertinency of retrievals. The first is the 
recall rate--the fraction of the relevant material which is retrieved; 
the second is the precision ratio--the fraction of retrieved material 
which is relevant. Authors (Shoffner, 1968; Tritschler, 1968; Cooper, 
1968; Bennett, 1966; Savage, 1964) criticize such models since relevance 
is difficult to determine. The problem lies in the assumption that there 
is such a person as the user. Users as humans are different and have 


different purposes. Relevance, therefore, is not easily dichotomized. 
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While these authors have pointed out the weaknesses in such an analytical 
model they have little to offer in terms of improving it. 

In order to standardize the interpretation of recall and precision 
rate this investigator has chosen to define relevance in precise, 
measurable terms for this design: a relevant retrieval is one that 
meets the four restrictions submitted by the user. Though there is 
some loss of information in limiting the meaning to this definition, 
there is reason to assume that recall and precision rates will not be 
overly, if at all, inflated. While it is true that some items meeting 
the four restrictions submitted by the user were not regarded as useful 
by, for example, the reviewer, there were items which were regarded as 
acceptable, though they were retrieved at a lower hierarchial level than 
those meeting all four criteria. (In the pilot run of this retrieval 
system, about one-half of the items out of a total of 201 that were 
presented to the test committee for final screening were of this nature. ) 

Using this definition of "relevance", calculated recall and precision 
rates would tend to be too great whenever a user rejected "relevant" 
material. Conversely, the calculated rates would be too low whenever a 
user accepted "nonrelevant" material. By virtue of the fact that both 
conditions are present in most, if not all, retrievals, it will be assumed 
that their influence on the recall and precision rates is nullified. 
Provided this assumption is valid, Ernst's analytical model is an 
appropriate one to use for this design. The following rates are reported 


in the, context of this limitation. 
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For MEDSIRCH-1 the recall rate was found to be 100%, that ieeeall 
items in the bank that could meet the user's specifications were retrieved. 
Verification of this was done on the card sorter. This excellent recall 
figure is a direct result of using manual instead of automatic indexing. 
Precision rate was considerably poorer. Six hundred items were retrieved 
from the bank which has only 425 items; this redundancy was a direct 
result of not incorporating the sixth specification on the user's request 
--hierarchical level. Of these 600 items 201 were selected by the 
reviewer. Thus the precision rate was about 33%. If the user had used 
the hierarchical level specificaiton, it is reasonable to assume that no 
more than 425 items, at most, would have been retrieved. In such a case 
the precision rate would have increased to 47% at least. 

Due to the fact that MEDSIRCH-2 retrieves items according to 
identification numbers, it is not surprising that the recall and precision 
rates in the pilot run were 100%. In practice, therefore, the program 
is only used as a means for grouping all pertinent items retrieved in 
MEDSIRCH-1, and then printing them out in a convenient format. 

The precision rate of MEDSIRCH-1, even estimated at 47%, is 
unacceptable. If an entire bank is retrieved one might as well have the 
items in a manual file; not only would the cost be reduced but time would 
be saved. However, the limited size of the bank and the nature of the 
items stored therein caused much of this difficulty. For example, while 
there were 20 items with the combined specifications of 2-1-2-1 there 
were no items meeting even the first restriction of 12-1-1-1, much less 
all four restrictions. Almost one quarter of all the items were Rabin 


trated in three subspecialities; and even within each of these three 
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subspecialities there was not a good distribution of items throughout 

the hierarchy of criteria. For example, though the above specification 
of 2-1-2-1 had 20 items, there were only two items with 2-1-3-1, and none 
with 2-2-3-3. It was not surprising, therefore, that a series of requests 
having little in common with what was available in the bank caused the 
precision rate to be low. 

To circumvent this problem, future users must have at their disposal 
freqency tables indicating the distribution of criteria. This, fortunately 
is already available. Reading a tape stacked with only the parameter 
cards of all the items a cross-classification program using simultaneous 
subdivisions will provide such information (see documentation by Flathman, 
1968). With this a priori knowledge, the precision rate may reach 100%. 

Another factor detracted from the usefulness of this program. It 
was decided by the test committee using the pilot retrieval that the 
specification "core level'' was not a useful criterion for retrieval. 

Since an examination was intended to cover all 23 subspecialities at 
three taxonomic levels while still using a reasonably short examination, 
it was felt the Department of Internal Medicine could not afford the 
luxury of testing anything but essential material. (Whether or not all 
specialities will feel the same way remains to be seen.) In any case, 
for future retrievals in Internal Medicine search criteria will be 
reduced to three specifications. 

Two of Dammers' (1968) objectives in a retrieval system are simpli- 
city of input and a directly useable output. Since the MEDSIRCH programs 


use a consistent format (fields of five) and require minimal preparation 
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of requests, a claim can be made that the first objective of Dammers is 
met. Since an item is printed in a manner that can be directly typed 

into an examination (along with the other user aids that were noted, 

cfs pp. 98-65), objective of esas output is at least met to a fair 
degree. However, when one considers the precision rate of the first 
retrieval for MEDSIRCH-1 it must be said that the output was not "directly" 


useable. 


Cost. 
Since monetary cost is a localized estimate, only cost in terms of 
time will be discussed. Table 6 provides the estimated cost per item 
entailed in the pilot run of this information storage and retrieval system. 
One will notice that the cost of preparing items for storage--typing 
and keypunching--is twice the amount entailed in a manual file. The time 
spent selecting items for MEDISRCH-1 for requests to MEDSIRCH-2 can be 
very great whenever 600 items must be manually searched. The work entailed 
by the reviewer would be common to any retrieval system--manual or machine. 
The time taken to retrieve an item is probably faster than that re- 
quired for a manual search even if a person was not sequentially searching 
a file as the computer did. Blocking tape and disks is directly respon- 
sible for the small amount of time spent in reading and writing; in fact 
this input/output process entailed only 5% of the execution time. 
However, with today's high speed computers such as the IBM 360/67 one 
cannot help but wonder how much time could be saved with the use of 


random direct access or list files. 
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TABLE 6 
COST PER ITEM 


(in terms of minute per item) 
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Minutes per item 
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MEDSERCH-1 retrieval for MEDSIRCH-2 .... 4... - SrA) 
Reviewer's working in checking content, spelling, etc. 5.0 
Computer 
Program Minute per item retrieved or updated 
Reading/ Total 
Writing Execution 
MEDSIRCH-1 0.003 0.048 
MEDSIRCH-2 0.002 0.040 
UPDATE OL OEY 0,034 
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Generality 


One objective of this design was for the broad application of such 
a system to other educational areas. This objective was at least partially 
fulfilled. Provided the multiple choice questions are punched according 
to format requirements (cf. pp. 46-48), all programs will handle the text of 
items regardless of the educational area. The indexing code in Appendix 
A is not so generalizable. Specifications such as subspeciality, second 
area of subspecialty, source, graduate or undergraduate, local or 
national, and speciality number are presently converted by means of data 
statements in the two MEDSIRCH programs, refer to the exact needs of 
Internal Medicine and could not be used for grade twelve examinations, 
for example. Even for other medical specialities the subspecialty codes 
are not applicable. There are means for adjusting the MEDSIRCH programs 
to fit most needs, however. 

To modify the programs for use by other medical departments involves 
the least work. For the specialty code one would have to take out the 
present data statements and insert ones containing references to the 
particular department making the modification. Since two columns are 
provided for these specifications on the first parameter card one could 
have up to 99 subspecialties. 

To adapt the program for use in the school system would require 
changing more than one data statement, but still involves little effort 
relative to the time required in designing a new system. For example, 
the two codes for subspecialty could be changed to subject area (Physics, 
Mathematics, Social Studies, etc.) and section (levers, polynomials, 


geography, etc.); province could be changed to grade, taxonomic level 
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modified to include the hierarchical levels suggested by Bloom, and 
Source to cognitive or affective domains. These are, of course, only 
Suggestions; the precise nature of codes would be determined by consulting 
the intended users. One would not be restricted to keeping exactly the 
same format for these codes, nor would he be required to use the same 
order of criteria or even the same criteria. If any of these modifica- 
tions were desired, however, more programming effort would be involved. 
Regardless of the number of changes made this system's file organization, 
search technique, and optimization procedures can provide a means for 
reducing considerably the time required for creating an information 
storage and retrieval system for anyone using multiple choice questions. 
Finally, to use this design's programs at other installations, 
Appendix G provides a list of hardware specifications that must be 
met. Included in this appendix are modifications that may be made in 


order to tailor the design to smaller or less sophisticated computers. 
Recommendations 


Suggestions have already been made as to how the retrieval process 
of this design may be inproved, namely, the use of the hierarchical 
level specification, the use of cross-classification tables, and the 
removal of the restriction core level. Whenever the bank acquires an 
equal frequency distribution of items throughout all possible levels it 
would be desireable to limit the number of retrievals to the exact number 
of items the user requests since, at present, all items are retrieved at 

Ps 


a given hierarchical level. Given more items than needed by the present 
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search technique, a ranking procedure based on the size of correlation 
coefficients (between a user's request and an item's remaining indexes) 
could be incorporated. For example, one could use further specifications 
such as second area of subspecialty, language, number of times used, 

last year question used, difficulty levels, and biserial coefficients 

as binary elements in each of two vectors--a user's request and an item's 
remaining information. The degree of angle between these two vectors 
would then be calculated for all items. Ordering items according to 
their relative size of angles, retrieval would take place until the exact 
number of items requested was met. Since new items would have missing 
information for most of these specifications one might want to retrieve 
these first and then retrieve whatever else is needed according to this 
ranking procedure. 

As the evaluation of this program is very subjective, further 
programming should be done with the bank, using random direct access and 
list file organizations, in order to determine which is the best design 
in terms of time, cost, and effort. Findings may support Climenson's 
(1967) claim that combinations of these file systems provide optimal 
means for retrieval. 

Despite the number of recommendations one can make it is the 
opinion of this writer that most, if not all, attempts to improve or 
evaluate present information storage and retrieval designs is analogous 
to a scheme for improving the destruction of an iceberg by blasting it 
from the top. When the full truth about the bottom of the iceberg-- 


that is retrieval design--is fully realized, the time spent in estimating 
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the relative worth of this design will be seen as poorly spent. Until 


such a time, simple systems such as the one described in this study 
seem to offer a reasonable means for storing and retrieving multiple 


choice questions. 
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Please note, to correctly code each item remember these three points 


Alles 


Column 


ae 


Punch 


TWO parameter cards are needed for each item. 
item information is available for the second card it is 
still necessary to punch the specialty number 01 (Internal 
Medicine) in col. 74-75 and card number "2" in col. 72 and 
item ID in col. 77-80 and place this otherwise blank card 


with 


the first parameter card. 


Even if no 


It is imperative that information for the first four vari- 
ables [(a) subspecialty, (b) type of question, (c) taxonomic 


teveisrand (d) core level] be given. 


If any one of the 


first four is omitted the item will never be selected. 


Use right justification throughout. 


UU 
Wo" 
Wwe" 
Was 
WoW 
Wel 
wa 
Wen 
Wigtt 
W190" 
US aL 
WyoN 
eR 
Wa 
Ue i 
"76" 
DUS 
meu 
" 9" 
Noo" 
Up) yy 
WOON 
Woe 


\5@2  Csaiel il 


Area of Subspecialty 


Allergy, Immunology, Serology 
Cardiovascular 

Collagen Diseases 

Dermat ology 

Chemical of Physical Agents 
Endocrinology and Metabolism 
Gastrointestinal (incl. Liver & Pancreas ) 
Hematology 

Infectious Diseases 
Musculoskeletal 

Neurology 

Psychological Medicine 
Pulmonary 

Renal 

Therapeutics 

Anatomy 

Biochemistry 

Genetics 

Laboratory Medicine 
Microbiology 

Pathology 

Pharmacology 

Physiology 


CALL) 
(CVS ) 
(COL) 
(DERM) 
(PHYSCHEM ) 
(END MET) 
(GI) 
(HEMAT ) 
(INF) 
(SKEL) 
(NEUR) 
(PSYC) 
(PULM) 
(REN) 
(THER) 
(ANAT ) 
(BIOC) 
(GEN) 
(LABMED ) 
(MICROB ) 
(PATH) 
(PHARM ) 
(PHYSIOL) 
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Column 


3 


6=7 


9=10 


Type of Question 


Banen, vo 
UUM 
Taxonomic 
Puneh '"y 
Wow 
Ine uu 


Core Level 


Panch. 1" 
Wom 
ng 


Single Answer 
Multiple Answer 


Level 


Factual 
Comprehension 
Problem Solving 


Essential 
More Important 
More Unimportant 


Second Area of Subspecialty (re. Col. 1-2) 


Source 


Punciy ee 
Wom 
Won 
Wy 
Wot 


Province 


Punch 
Wow 
Wait 
Wyte 
WHOM 
Wet 
UM 
Wen 
wou 

WOW 
Wigan 
UU 
myg" 
UA 
W115 Ui 
we 


American Board of Internal Medicine 
National Board of Medical Education 
Canada 

United Kingdom 

Other 


Alberta 

British Columbia 
Dalhousie 
Lavalle 

McGill 

McMaster 
Manitoba 
Montreal 

Ottawa 

Queens 
Saskatchewan 
Sherbrooke 
Toronto 

Western Ontario 
Calgary 
Memorial 
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(SING ANS) 
(MULT ANS) 


(FACT) 
(COMP) 
(PROB ) 


CESS.) 
(IMP. ) 
(UIMP) 


(AMIB) 
(NBME) 
(CAN) 
(UK) 
(OTH) 


(ALTA) 
(Baca) 
(DALH) 
(LAVL) 
(MCG) 
(MCM ) 
(MAN) 
(MTRL) 
(OTT) 
(QN) 

(SASK) 
(SHRB ) 
(TOR) 

(UWO) 

(CALC) 
(MMRL) 


(HIMA) 
CMa) 
(HAD) 
(aU) 
(HFO) 


(ATda} 
Cadet}. 
(HAG) 
t IVAT) 
(DOM) 


Column 


slats 


12-14 


foes 


20 


Pa 


22-29 


24-29 


26 


20 


28-30 


31-34 
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Audio-Visual 


Puneh? Wt 
Mom 
ng 
MyM 
Nom 
New 


Line 

Photo 
Color 
Slide 
Movie 
Video 


Audio-Visual I.D. location 


Correct Response Alternative(s) 


eats 
UU 
seat 
UU UY 
mat 


Language 


att 
WoM 
Wen 


Number of 


in col. 15 choice 1 is correct 
ime col... 16. choice is correct 
in, col. 17 choice is correct 
in col, 18 choice 1s correct 
inecol.,.19 choice is correct 


mF W Nh 


available in both English and French 
available in English only 
available in French only 


times used 


(If zero there should be no more entries on this or the next 
parameter card except for specialty, card and item ID #) 


Last year 
Number of 


UU 
mon 


vas 
Won 


question used 
question on last exam 


Graduate exam 
Undergraduate exam 


National exam 
Local exam 


ID of exam 


Number of examinees on last exam 


Use col. 35-42 only for single-answer type of questions. Use col. 43-66 
of parameter card 1 and col. 1-24 of paramter card 2 only for multiple- 


answer type of questions. 


teen sit 76 abd+ no Zelitas erom on od | 
(W Qi tert bes Bees, ytipioege sot 
= ha 


a . ews 
-2noltaaup Ye eqyl 


a8=5) . fon 6 oiteeup Ye vagy ELE >! 
cabshfdon and vine Se x > 208 ag to 2 


7 


104 


column 

35-36 Moy for last recorded testing year (single answer-type of 
question) 

37-38 "pb" for second last recorded testing year (single-answer 
type of question) 

39-40 Dae for last recorded testing year (single-answer type of 
question) 

41-42 Dose for second last recorded testing year (single-answer 
type of question) 
Multiple-Answer Type of Question: "P" for last recorded 
esting year. 

43-44 First choice 

45-46 Second choice 

47-48 Third choice 

49-50 Fourth choice 

5l=52 Fifth choice 

53-54 Total item 
Multiple-Answer Type of Question: Piss for last testing year 

55-56 First choice 

57-56 Second Choice 

59-60 Third Choice 

60-62 Fourth choice 

63-64 Fifth choice 

65-66 Total Item 

74-75 Specialty Punch "01" for Internal Medicine 

76 Pune vie 

77-80 Item ID (must be identical to the number in col. 74-77 of 


cards carrying the test of the item). 
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aed he 
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se a, 


aatiin 


tasy gcitesy tesi wot 47 


LOS 


Avos. Cand > 


Column 
eae eee Type of Question: "P" for second last recorded 
1-2 First choice 
3-4 Second choice 
5-6 Third choice 
7-8 Fourth choice 
9-10 Fifth choice 
11-12 Total item 
Multiple-Answer Lype ef Question: Yois for second last 
recorded testing year. 
13-14 First choice 
15-16 Second choice 
17-18 Third choice 
19-20 Fourth choice 
21-22 Fifth choice 
23-24 Total Item 
Proportion on last test selecting these choice 
25-26 First choice 
27-28 Second choice 
29-30 Third choice 
31 +32 Fourth choice 
33-34 Pifth choice 
74-75 Specialty, Punch "01" for Internal Medicine 
76 Punch 32.) 
77-80 Item ID (must be identical to ID in col. 77-80 of first 


card) 
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MEDSIRCH-1 DIVISION OF EDUCATIONAL RESEARCH SERVICES 
UNIVERSITY OF ALBERTA 


6290) °0)" 6:6" 00:6) 6 10 0) 6s 6, 6) 28) 6) oo 8 


SEARCH FOR MEDICAL EXAMINATION QUESTIONS 
ROYAL COLLEGE OF PHYSICIANS AND SURGEONS 
DEPARTMENT OF INTERNAL MEDICINE 


PROGRAMMER: C.B.HAZLETT 


PURPOSE: 
1. READS MULTIPLE CHOICE ITEMS FROM TAPE AND SELECTS 
THOSE MEETING THE FIRST FOUR RESTRICTIONS USER REQUIRES. 
2. IF USER REQUIRES MORE ITEMS THAN NUMBER MEETING FOUR 
RESTRICTIONS , THOSE MEETING THE FIRST THREE (THEN TWO, 
THEN ONE) ARE ALSO GIVEN, IF REQUESTED. 


CARD INPUT: 
I. PARAMETER CARD 
A. USE RIGHT JUSTIFICATION. 
B. ONLY ONE CARD NECESSARY FOR EACH SET OF RESTRICTIONS. 
2. LAST CARD: BLANK CARD 


DESCRIPTION OF PARAMETER CARD (515) 

MED - AREA OF SUBSPECIALITY 

- IF MED= ALLERGY , IMMUNOLOGY , SEROLOGY 
a CARDIOVASCULAR 

COLLAGEN DISEASES 
DERMATOLOGY 
CHEMICAL OF PHYSICAL AGENTS 
ENDOCRINOLOGY AND METABOLISM 
GASTROINTESTINAL (INCLUDING LIVER,PANCREAS ) 
HEMATOLOGY 
INFECTIOUS DISEASES 
MUSCULOSKELETAL 
=l11 NEUROLOGY 
=12 PSYCHOLOGICAL MEDICINE 
=13 PULMONARY 
=14 RENAL 
=15 THERAPEUTICS 
=16 ANATOMY 
=17 BIOCHEMISTRY 
=18 GENETICS 
=19 LABORATORY MEDICINE 
=20 MICROBIOLOGY 
=21 PATHOLOGY 
=22 PHARMACOLOGY 
=23 PHYSLOLOGY 
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NTYP - TYPE OF QUESTION 
- IF NTYP=1 SINGLE ANSWER 
=2 MULTIPLE ANSWER 
NTAX - TAXONOMIC LEVEL 
- IF NTAX=1 FACTUAL 
=2 PROBLEM SOLVING 
NCORE - CORE LEVEL 
- IF CORE=1 ESSENTIAL MATERIAL 
=2 MORE IMPORTANT THAN UNIMPORTANT MATERIAL 
=3 MORE UNIMPORTANT THAN IMPORTANT MATERIAL 
NUM - NUMBER OF ITEMS DESIRED WITH THESE RESTRICTIONS 
iTR - HIERARCHICAL 
tele LTRS LPPALL CGE VELSe LE NECESSARY 
ITR=2 ONLY THREE LEVELS 
ITR=3 ONLY TWO LEVELS 
ITR=4 FIRST LEVEL ONLY 


REMARKS 
LIMITATIONS 
=-MAX 100 CARDS PER ITEM 
=MAxX 9999S" ITEMS 


SUBROUTINES 
SORT 
PARMTR 
TEXT 


DIMENSION STEM(100,17),NQ(100),NPT(100),IX(22),FX(33),NAM(2,23),NC 
1H(2,2),NTA(1,3),NES(1,3) 


DATA NAM/'ALL ',' EE LCYSE Us UrECOn, te! ' 'DERM',' Ad 
1PHYS','CHEM','END ','MET ','GI ',! 'S'HEMA','T ','INF ',! 

2 PL tOkKE bl ae YS uEURY CaLeSy Ow Vo eULM I! Ve Re = 
3! UTHER ial PUAN AT a BLOC ' > 'GEN s ot GA 
LEME EDesteMECR' "OB |, PATH! 5! Y PHAR MOO PHS tad Tot? 


SLO / GOW eI BLK/) 1/7, TA) ' 8"). TE / ME) NCH/ SINGS (ANS# J oMUIN tt CANS 
CW ANTAIPACEI, LCOMPY. UT PROBL /AGNES/MESSs", "EMP. 7 5 ULMR/ 


DEFINITIONS: 


100 


NCI : COUNTER FOR NUMBER OF ITEMS PRINTED 
NITEM1: COUNTER FOR NUMBER OF ITEMS MEETING FIRST 4 RESTRICTIONS 
NITEM2: COUNTER FOR NUMBER OF ITEMS MEETING FIRST 3 RESTRICTIONS 
NITEM3: COUNTER FOR NUMBER OF ITEMS MEETING FIRST 2 RESTIRCTIONS 
NITEM4: COUNTER FOR NUMBER OF ITEMS MEETING FIRST 1 RESTRICTIONS 
CALL PDISC 

CONTINUE 

NCI= 0 

NITEM1=0 

NITEM2=0 

NITEM3=0 

NITEM4=0 

NDISC=0 
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REWIND 1 
REWIND 2 
REWIND 3 
REWIND 4 
READ(5,1)MED,NTYP,NTAX ,NCORE,NUM,ITR 
1 FORMAT(6I15) 
PPOMEDLEO.0:) GO: TO. 27 
WRITE(6 ,2)(MED ,(NAM(T ,MED) ,I=1,2) ,NTYP,(NCH(I,NTYP) ,I=1,2),NTAX,(N 
1TA(I ,NTAX) ,I[=1,1),NCORE,(NES(I,NCORE) ,I=1,1),NUM) . 


2 FORMAT(1H1,10X,'RESTRICTIONS IMPOSED: '//,15X,'AREA OF SU 
1BSPECIALTY'10X,12,2X,'- ',2A4,/,15X,'TYPE OF QUESTION',15X,11,2x,' 
Qe) S284. /325x%; TAXONOMIC LEVEL’ [16x ,11,2X,'- ',1A4,/,15Xx, CORE LEV 
3EL' ,21X,11,2X,'- ',1A4,/,15X,'NUMBER OF ITEMS REQUESTED’ ,3X,I4,//) 

3 CONTINUE 
J=0 

4 CONTINUE 

C READ # OF CARDS CONTAINING ITEM(STEM(J,K)) 

J=J+1 


READ(1)(STEM(J,K) ,K=1,17),NQ(J) ,NPT(J) 
DPCNPTGI) sEO.1E) GO TO 22 
C END OF ITEM SENSED IF NPT(J) IS BLANK. THEN READS TWO PARAMETER CARDS 
C ACCOMPANYING EACH ITEM. 
TF(NPTGI) INESIBLK) GO TO 4 
READ CIGIXG! ) $1=20.522) [Crx(K) K=1.16) 
READ(1)( FX(K) ,K=17,33) 
C CHECK FOR MATACH OF ITEM ID#(NQ(J)) AND FIRST PARAMETER CARD ID # (ID1) 
TPGIxNGi). EO. MED) GO’ TO? 
GOuTOnS 
7 CONTINUE 
C IF ITEM MEETS ONLY FIRST RESTRICTION WRITE IT ON TAPE 4 
C AND INCREMENT NITEM4 
TPCIX(2).EQSNTYP) GO TO 11 
TPGETR 76%. 1), GO1T0.3 
NITEM4=NITEMY4+1 
DORGSE=1 4 
WRITE(4)(STEM(L,K) ,K=1,17),NQ(L) ,NPT(L) 
9 CONTINUE 
WRETECH )GEXCL) 5 1=1,,19)) 
Weise Gh \GRX GD )igl=20.92,) , (FXCK):,K=1516) 
WRITE(4)( FX(K) ,K=17,33) 
10 CONTINUE 
GORLO 13 
11 CONTINUE 
C IF ITEM MEETS ONLY FIRST TWO RESTRICTIONS WRITE ON TAPE 3 
C AND INCREMENT NITEM3 
TRCIX GS VEO .NTAX) GO TO 15 
GRCETR 4602) GO TO 3 
NITEM3=NITEM3+1 
DOr 2S) HL 3d 
WRITE(3)(STEM(L,K) ,K=1,17) ,NQ(L) ,NPT(L) 
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13 CONTINUE 
WRITE(Q)( 2 XC), T=20.79)) 
WROTE GS CENT pe =20 422). (FX(K) ,.K=1 16 ) 
WRETE GSC FXKCOs Kel7 533) 
14 CONTINUE 
GOaroie 
15 CONTINUE 
C IF ITEM MEETS ONLY FIRST THREE RESTRICTIONS WRITE ON TAPE 2 
C AND INCREMENT NITEM2 
TRCIXG)SEO.NCORE), GO TO 19 
ERCIER2GI=30). COTO 3 
NITEM2=NITEM2+1 
DOVE yrhe Ved 
WRETEC@2 )CSTEMCL KH), K=1 517.) ,NO(L) .NETCL) 
17 CONTINUE 
WRi TEC?) GEXC1) s1=1219)) 
WROTE G2) CIXCL) ,.1=20 522) -(PX(K) Kal 16 ) 
Wel THC 2) (EXC Kk). K=17, 33) 
18 CONTINUE 
GONTO:S 
19 CONTINUE 
C IF ITEM MEETS ALL FOUR RESTRICTIONS PRINT AND INCREMENT NITEM1 
NITEM1=NITEM1+1 
IP(NITEM1.NE.1) GO TO 29 
WRITE(6,28) 
28 FORMAT(1HO,30X,'THE FOLLOWING ITEMS MEET THE ABOVE RESTRICTIONS:', 
Wee) 
239 CONTINUE 
NCI=NCI + 1 
WRITE(6,33) NCI,(STEM(1,K) ,K=1,17),NQ(1) 
23 BORMATOXC DW sx 17 AU LOX. ITEM 2D 168) 
C IF NPT(J) IS A * SKIP A LINE IN PRINTING 
MECNPTOL). EO. 1G) GO TO 35 
WRITE( 6,36 ) 
36 FORMAT(1H ) 
35 CONTINUE 
DOL Vales 
WRITE(6,20 )(STEM(L,K) ,K=1,17) 
20 FORMAT(10X,17A4) 
TECNPT GL) .EQ. 1x), GO TO! 21 
WRITE(6,101) 
101 FORMAT(1H ) 
21 CONTINUE 
CALL PARMTR (1X,FX) 
38 CONTINUE 
CALLOTEXT 
WRITE(6,32 ) 
32 FORMAT(1H1) 
S051) 3 
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22 CONTINUE 
IF(NITEM1.GE.NUM)GO TO 37 
DF CETR EO. 4) COMTO 37 
REWIND 2 
REWIND 3 
REWIND 4 
C TO OBTAIN AS MANY ITEMS AS REQUESTED(NUM), CHECK FOR HOW MANY SCRATCH 
C DISCS(NUMBERED 2 TO 4, EACH OF WHICH CONTAINS ONLY FIRST 3,2,0R 1 
C RESTRICTION RESPECTIVELY. READ THESE DISCS IN SORT 
NTOT=NITEM1+NITEM2 
NTOTA=NITEM3+NTOT 
NTOTAL=NITEM4+NTOTA 
EFCNTOT.GE.NU.) GO TO 24 
TECTERVEGsS) GO TO) 24 
IF(NTOTA.GE.NUM)GO TO 25 
IF(ITEM.EQ.2) GO TO 25 
NDISC=4 
GO TO 26 
24 CONTINUE 
NDISC=2 
GO"TO 26 
25 CONTINUE 
NDISC=3 
26 CONTINUE 
CALL SORT (NDISC,NITEM1,NITEM2 ,NITEM3,NITEM4 ,NUM,IC,IBLK,IA,IE,NTO 
1T ,NTOTA ,NTOTAL NCI) 
37 CONTINUE 
GO TO 100 
27 CONTINUE 
REWIND 8 
STOP 
END 
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SUBROUTINE PDISC 
DIMENSION STEM(100 ,17),NQ(100),NPT( 100 ),IX(22),FxX(33) 
DATE SIC /CN/ SIBEKY) 807. TA/1 a ae te / 
REWIND 8 
3 CONTINUE 
J=0 
4 CONTINUE 
C READ # OF CARDS CONTAINING ITEM(STEM(J,K)) 
J=J+1 
READCS $5) (STEM( Jy) 5K=1517) NOC) NPT) 
5 FORMAT(1X,17A4,4X,14,2X,Al) 
IF@J=263 63.59 
59 CONTINUE 
C CHECKFOR CONSISTENT ITEM ID # (NOGJ)) 
IF(NQ(J)-NQ(J-1))60 ,62,60 
60 CONTINUE 
WRITE(6,61)NQ(J) ,NQ(J-1) 
61 FORMAT(/////,1X,'****NOTE ,NOTE ,NOTE ,NOTE****!' | /,6X,'MISMATCH OF ID 
SAW ITEM Te? AND Yo Tu///7/ ) 
62 CONTINUE 
63 CONTINUE 
C CHECK FO? INVALID CHARACTER IN NPT(J). ALLOW C,E,*, OR BALNK 
IF(NPT(J).NE.IC.AND.NPT(J).NE.IBLK.AND.NPT(J).NE.IA.AND.NPT(J).NE. 
LIE)GO TO 41 
GO Gs 
41 CONTINUE 
WRITE(6,42) NQ(J), NPT(J), J 
42 FORMAT(/////,1X,'****NOTE NOTE NOTE NOTE NOTE',/,6X,'ITEM ',I4,' 
1HAS THIS INCORRECT ALPHABETIC ENDING:(',Al,') ON CARD',13,/////) 
43 CONTINEU 
C IF E END OF TAPE IS SENSED 
TECHPTC ).E021E) GO TO 22 
C END OF ITEM SENSED IF NPT(J) IS BLANK. THEN READS TWO PARAMETER CARDS 
C ACCOMPANYING EACH ITEM. 
IF(NPT(J).NE.IBLK) GO TO 4 
READCS.6 )CLXC1)5 1=1 522) 5(PX(4) SK=1 516) NCARDL IDI (FxC) K=175ec 
1INCARD2 , ID2 
6 FORMATGIO- STI OCL2. 1h) i153 ,51 1.27212 QU ese e Roe oi a 
ELTR2 eo 4x Le, ct.) 
C CHECK FOR MATCH OF ITEM ID#(NQ(J)) AND FIRST PARAMETER CARD ID # (ID1) 
IF(NQ(J)-ID1)56 ,58,56 
56 CONTINUE 
WRITE(6,57) NQ(J),ID1 
57 FORMAT(/////,1X,'*#***NOTE NOTE NOTE NOTE****"',/,6X,'MISMATCH OF ID 
1S BETWEEN ITEM AND PARMETER CARD:',/,6X,‘ITEM ID IS:"',1T4./,6X,'PAR 
TAMETER 1Do0S:) lus /////) 
CORrOnC 


58 CONTINUE 
C CHECK FOR ORDERING OF PARAMETER CARDS (NCARD1 AND NNARD2 ) 
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IF(NCARD2-NCARD1 )50 ,50 ,52 
50 CONTINUE 
WRITE(6,51) NCARD1,ID1,NCARD2,ID2 
51 FORMAT(/////,1X,'****NOTE NOTE NOTE NOTE ',/,5X,'PARAMETER CARDS A 
1RE OUT OF ORDER.',/,5X,'FIRST PARMETER CARD READS:CARD ',11,2X'ITE 
2M # ',14,/,5X,'SECOND PARAMETER CARD READS: CARD ',11,2X'ITEM #',1 
34 ,/////) 
GO TO 3 
52 CONTINUE 
C CHECK FOR MATCH OF PARAMETER CARDS ID #'S (ID1. AND ID2) 
IF(ID1-ID2 )53,54,53 
53 CONTINUE 
WRITE(6,51) NCARD1,ID1,NCARD2,ID2 
Gor Tors 
54 CONTINUE 
C IF ITEM DOES NOT MEET FIRST RESTRICTION READ NEXT ITEM 
DOMOLL=E 1k J 
WROTE (@)( STEM(L,K) ,K=1,17), NOC L) NPTCL) 
9 CONTINUE 
WRITE )@IX(1) 5241.19) 
RRC E CI CIX(D)RES20.20)- (PX(K) K=l 16) 
WRITE(1)(FX(K) ,K=17 533) 
GOrT Ors 
22 CONTINUE 
WEREDEC 1L)CSTEMCL SK) K=05, 17), NOG. )GNPTCY) 
RETURN 
END 
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SUBROUTINE SORT(NDISC ,NITEM1,NITEM2 ,NITEM3 ,NITEM4 ,NUM,IC,IBLK,IA,I 
1E ,NTOT ,NTOTA ,NTOTAL,NCI) 
DIMENSION IX(22) ,STEM(100 ,17),NQ( 100) ,NPT(100 ) ,ND( 100) ,FX(33) 
C SIZE OF DO LOOP DETERMINED BY # OF ITEMS NEEDED 
DOLSSey  T=2eNDISC 
NCRT=0 
C START OF EACH NEW DISC THESE STATEMENTS ARE WRITTEN 
He CYB O<2 se COATO 16 
LEY CPSESO.3) GO: TO 11 
LF. CISE 7004) 1GOSTO" 16 
6 CONTINUE 
WRITE(6,2) NITEM1 
2 FORMAT(1HO,29X'THE ABOVE ',I4," ITEMS MET THE FOUR RESTRICTIONS 
1SUBMITTED.',/,30X,'THE NUMBER OF THESE ITEMS IS LESS THAN THE QUOT 
2A ASKED.') 
WRITE(6,7) 
7 FORMAT(1H ,29X'TO MEET QUOTA,ITEMS MEETING THE FIRST THREE RESTRIC 
1TIONS ARE GIVEN.') 
IF(NTOT.G.E.NUM)GO TO 9 
WRITE(6,8) 
8 FORMAT(30X,'HOWEVER,THIS IS STILL LESS THAN THE QUOTA.'/////) 
GOTO +21 
9 CONTINUE 
WRITE(6,10 ) 
10 FORMAT(30X'WITH THIS SECOND SET OF ITEMS THE QUOTA IS MET OR EXCEE 
IDEDS S// 777) 


GORTO 12 i 

11 CONTINUE 
WRITE(6,12) 

12 FORMAT(1H1,29X,'TO MEET QUOTA,ITEMS MEETING THE FIRST TWO RESTRICT 
LIONS ARE NOW ALSO GIVEN.' ) 


IF(NTOTA.GE.NUM)GO TO 14 
WRITE( 6,13) 
13 FORMAT(30X,'HOWEVER,THIS IS AGAIN LESS THAN THE QUOTA.' /////) 
COaTONOL 
14 CONTINUE 
WRITE(6,15 ) 
15 FORMAT(30X,'WITH THIS THIRD SET OF ITEMS THE QUOTA IS MET OR EXCEE 
IDEDA"/////) 
GoTo (21, 
16 CONTINUE 
WRITE(6,17) 
17 FORMAT(1H1,29X,'TO MEET QUOTA,ITEMS MEETING THE FIRST RESTRICTION 
1ARE NOW ALSO GIVEN.'  ) 
IF(NTOTAL.GE.NUM)GO TO 19 
WRITE(6,18 ) 
18 FORMAT(30X'HOWEVER, THE QUOTA CANNOT BE MET WITH THE PRESENT STORAG 
1E Or eErEMS. f///7/) 
GOnmFON27 
19 CONTINUE 
WRITE (6,20) 
20 FORMAT(30X,'WITH THIS FOURTH SET OF ITEMS THE QUOTA IS MET OR EXCE 
1E0ED.' /////) 
21 CONTINUE 
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C CHECK TO SEE IF WRITTEN OUT ALL ITEMS PUT ON RESPECTIVE 
Cepisee 


24 


Zo 


26 


on 


28 


LFCISZ EQ. 2)GOeTON24 
DECL. EQ 2 )GO.TO 25 
TECT EOS 4) GO, TONG 
CONTINUE 
LPCUNCRI- EGS NITEN2 )GO TONS7 
GO >TOR27 

CONTINUE 
IF(NCRT.EQ.NITEM3)GO TO 37 
GOTEOR27 

CONTINUE 
IF(NCRT.EQ.NITEM4)GO TO 37 
GO°TOr27 

CONTINUE 

J=0 

CONTINUE 


C READ AND WRITE ITEM IN SAME MANNER AS IN MAIN LINE. 


40 


41 


42 


43 


ou 


32 


33 
34 


36 


22 


37 
38 


J=J+1 
READ(1I)(STEM(J ,K) ,K=1,17),NQ(J) ,NPT(J) 
TEQNPTCI )YNESIBLK) GO TO 28 

README CIXCND) -Ni=2.19)) 

READ( 1) (IX( NI) SNI-20 22), (FXCKK) ,KK=1, 16 ) 
READ(I) (FX(KK) ,KK=17,33) 

NCI=NCI + 1 

WRITE(6,40) NCI,(STEM(1,K) ,K=1,17),NQ(1) 
BORMATOOX Tu! (Ox 7AM LOX SULIEM 2D 1S sat) 
IF(NPT(1).EQ.IA) GO TO 41 

GOsTO 143 

CONTINUE 

WRITE(6,42) 

FORMAT(1H ) 

CONTINUE 

DORSH® Less 

WRITE(6,31)(STEM(L,K) ,K=1,17) 
FORMAT(10X,17A4) 

TEC@IPTCS).E-Q-. 1A) GO TO 32 

Gor Tore 

CONTINUE 

WRITE(6,33) 

FORMAT(1H ) 

CONTINUE 

CALL PARMTR (IX,FX) 

CONTINUE 

CAULETEXT 

NCRT=NCRT+1 

WRITE( 6,22) 

FORMAT(1H1) 

Gel TOL21 

CONTINUE 

CONTINUE 

RETURN 

END 
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SUBROUTINE PARMTR(1IX,FX) 


DIMENSION NAM(2,23),NCH(2,2),NTA(1,3),NES(1,3) »NAREA(1,5),NPROV(1, 
116) ,NVIDO(2,6), Bonus 2SNATC2 2) ccs 3) x22) 3 eo 


DATA NAM/'ALL ',' seve nts! oe 0) ae Y UDERES oe 
TRHYS tS CHEM! aU END | MET Glo. tes ''HEMA' 1) ¥ a ie 

> b) 
OuplekGKr ie. | ' 'NEUR",' PLES Ott YP ULM Pa VREN ts 
3! Y SUTHER Se WeOANAT tye ' ie BLOGIA GEN ‘yal ! 'LA 
4BM', "ED ','MICR','OB ','PATH'm! ','PHAR','M =, 'PHYS','IOL ! 
5 / b) > 


DATA NCH/'SING','’ ANS','MULT'n' ANS'/ 

DATA NTA/'FACT','COMP' ,' PROB! / 

DATA NES/'ESS.','IMP.','UIMP'/ 

DATA NAREA/'AMIB','NBME','CAN ','UK ','OTH '/ 

DATA NPROV/'ALTA','B.C.','DALH',"LAVL','MCG ','MCM ','MAN ','MTRL' 
1,'OTT ','QN ','SASK','SHRB','TOR ','UWO ','CALG','MMRL'/ 


DATA NVIDO/'LINE',! t PHORM, “Ole Vi iCOLOteriRs  ¢L'SLID",'E 4 
WMOVind,'E '.°VIDEN.'0 1/ 
DATA NGRAD/'GRAD','. ','UGRA','D. '/ 


DATA NAT/'NAT.','EXAM' ,'LCC.','EXAM'/ 
DATA NEANG/ “BIHey, LANG “SENG.cs' ONEYY PR. *,'ONLY'7 
C THIS SUBROUTINE CHECKS THE PARAMETER CARDS ACCOMPANYING 
C EACH ITEM AND SUBSTITUTES THE NUMERIC VALUES WITH PROPER 
C ALPHABETIC NAME.(SOME NUMERALS ARE PRINTED. ) 
WRITE(6,37) 
37 FORMAT(///,39X,'PARAMETERS FOR THIS ITEM:',37X,'REVIEWER PLEASE CO 
1MMENT ON:',//) 
U=IxG1)) 
WRITE(6 ,1)(NAME(I,J),1I=1,2) 
1 FORMAT(10X,'AREA OF SUBSPECIALTY:',2X,2A4,44X,'CONTENT!' ,20X,'OK?.. 
ees) 
T= C5) 
Pa@heteO.ORs0,GT.26) GO 10's 
WRITE(6,2) (NAME(I,J),I=1,2) 
2 FORMAT(10X,'SECOND AREA OF SUBSPECIALTY:',2X,2A4,44X,'SENSE' ,22x,' 
NOKp RS. 9) 
CORLOTLy 
3 CONTINUE 
WRITE(6 ,42) 
42 FORMAT(10X, 'CFCOND AREA OF SUBSPECIALTY:' ,54X,'SENSE' 5 22X, ‘OKe. 
ie) 
41 CONTINUE 
J =0KG2)) 
TeGiele.0.0R.J.GT.3) GO TOS 
WRITE(6,4) (NCH(I,J),1=1,2) 
4 FORMAT( 10X, 'TYPE OF QUESTION: ',13X,2A4,44X, 'GRAMMAR",20X,'OK?.... 
1) 
5 CONTINUE 
J=1X(3) 
TEGUisLE10/ORsu). GE 3) )GORTON7 
WRITE(6,6 )(NTA(I,J) ,1=1,1) 
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; id 


Lt 


6 FORMAT( 10X, "TAXONOMIC LEVEL: ' ,14X,1A4,48X,'RESPONSES CORRECT! , 10x 
WOK? Re, *) 
7 CONTINUE 

J=IX(4) 

PE COMUE OL Odie. 2) GOP Tor 9 

WRITE(6,8), (NES(1,J) ,1=1,1) 


>) 


9 CONTINUE 
WRITE (6,44) 


43 CONTINUE 
J=1IX(6) 
BGs LEMOR OReU. Gin 5)) COmrOen 
WRITE(6,10) (NAREA(I,J),I=1,1) 
10 FORMAT(10X,'SOURCE:',23X,1A4,48X,'WRITTEN COMMENTS:') 
GO TO 46 
11 CONTINUE 
WRITE(6 ,45 ) 
45 FORMAT(10X,'SOURCE:',75X,'WRITTEN COMMENTS:' ) 
46 CONTINUE 
J=ixX(7) 
De LES Os OR u.6r. 16) GOUTO-13 
WROETEC 6312) e( NPOV(I.d) ,f=U21) 
12 FORMAT(10X'UNIVERSITY:' ,19X,1A4) 
13 CONTINUE 
Csr 8h) 
At. UEP OnOReerGr.o) GO TO 15 
WRITE(6,14)(NVIDO(1I,J) ,1=1,2),1X(9) 
14 FORMAT(10X'AUDIO VISUAL MATERIAL NEEDED: ',2A4,': IDENT.OF SOURCE: 
Wee t3t ys) 
15 CONTINUE 
WRITE(6,16 ) 
16 FORMAT(10X'CORRECT RESPONSE ALTERNATIVE:' ) 
WRITE(6,17 ) 
17 FORMATC10x !CHOICE 1 CHOICE 2 CHOICE 3) CHOICE 4 CHOICE 5 " ) 
WRITE(6,18)(IX(1) ,1=10,14) 
ie SeORMATOLSXKIL. 9X, Ll, 9Xs115 SX, 9X2. /) 
g=1K0 15) 
PECURLELO.OR.G+Gl+3o)GO TO) or 
WRITE(6,30)(NLANG(I,J),I=1,2) 
30 FORMAT(10X'THE QUESTION IS AVAILABLE IN ',2A4,/) 
31 CONTINUE 
WRITE(6 ,23) IX(16) 
23 FORMAT(10X'NUMBER OF TIMES USED:',10X,1I1) 
IF(IX(16).EQ.0) GO TO 40 
TECIXC1L7 ).EO. 0) GO TO 20 
WRITE(6,19) IXC17) 
19 FORMAT(10X'LAST YEAR QUESTION WAS USED:',2X,12) 
20 CONTINUE 
IF(IX(18).EQ.0) GO TO 58 
WRITE(6 ,22)1X(18) 
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22 FORMAT(10X,'QUESTION NUMBER ON LAST EXAM:',1X,I2,/) 
58 CONTINUE 
J-IX(19) 
ECE .0} ORs) GT?) GO TO-27 
WRDITE((6,26) (NATCI ,J) ,T=1,2) 
26 FORMAT(53X,2A4) 
27 CONTINUE 
IFCIX (2 1)sEOUO)s GOP TOs 29 
WRITE(6,28) IX(21) 
203 FORMAT (53x. DD=)'5 03) 
29 CONTINUE 
THGLA(22)55O>0)* GO TO 59 
WRITEC6 547 )GEXCK).K=1 2) 

47 FORMAT(10X,'DIFFICULTY LEVEL OF THIS SINGLE-CORRECT-ANSWER TYPE OF 
IMOUESTION')/.,10X, AT, THESE RECORDED TESTING YEARS:" ./ 20x. "FIRST 
OF SECOND 11K. FY a2. 6X Peo) 

WRITE(6 ,48 )( FX(K) ,K=3,4) 
48 FORMAT(10X,'BISERIAL COEFFICIENT FOR THESE RECORDED TESTING YEARS: 
velox rE RST SECOND) 2th oox ete D ys) 
GO TO 54 

49 CONTINUE 
WRITE(6,50 ) 

50 FORMAT(10X,'DIFFICULTY LEVELS AND BISERIAL COEFFICIENTS OF THIS MU 
1LTIPLE-ANSWER TYPE OF',/,10X,'QUESTION FOR THE FIRST RECORDED TEST 
2ING YEAR:') 

WRITE(6,51) 
51 FORMAT(15X'1ST CHOICE 2ND CHOICE 3RD CHOICE 4TH CHOICE 5TH CHO 
ITCH TOTAL, LTEM*) 
WRITE(O552)( Fx  K)),K=5. 16) 
52 FORMAT(10X,3H'P' ,6X ,6(F4.2,8X),/,10X,3H'R' ,6X,6(F4.2,8X) ,/) 
TFCIX(16).LT.2) GO TO 54 
WRITE(6,53) 

53 FORMAT(10X,'DIFFICULTY LEVELS AND BISERIAL COEFFICIENTS OF THIS MU 
1LTIPLE-ANSWER TYPE OF ',/,10X,'QUESTION FOR THE SECOND RECORDED TE 
2STING YEAR:') 

WRITEC 6551) 

WRITE(6,52)( FX(K) ,K-17,28) 
54 CONTINUE 

WRITE(6,55 )( FX(K) ,K=29,33) 

55 FORMAT(10X,'PROPORTION ON LAST TEST SELECTING THIS DISTRACTOR:',/, 
110X,'FIRST SECOND THIRD FOURTH FIFTH',/,11X,F4.2,4X,F4.2,3X,F4 
PLOTS Ga oS CP Ma es) 

40 CONTINUE 

RETURN 
END 
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SUBROUTINE TEXT 
C THIS SUBROUTINE SIMPLY WRITES OUT A CHECK LIST FOR 
C TEST COMMITTEE 
WRITE(6,1) 
1 FORMAT(//) 
WRITE(6 ,3) 
S FORMATCIOX "TEST COMMITTEE :'2/ j26x, ACCEPT AS 1S; "04x. tyes, 
Teen’ ./,26%. MODIFY AS NOTED ANDJACCEPT: "5 20X,"-YES.... NO... 


CO WOATEGOR Wore ee he Bebe. 2) Onli nines ENO wee sy. COR DA Lise, 


Br) 2G NTI Aeine ss a) 
WRITE( 6,30) 
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RETURN 
END 
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UTILITY PROGRAM: CARD TO TAPE 


To STACK tape: prepare these system cards and place before data: 


lolol COL. NY 
¥ + 
1) P/PRIND EXEC PGM=IEBGENER 


2) //SYSPRINT DD SYSOUT=A 


Colma 
: A; 
3) //SYSIN DD DUMMY col. ; 
4) //SYSUP2 DD DSNAMB=MEDS ,LABEL=(1.SL),UNIT=SYSTP, C 
col.16 
+ 
a) f/ DCB= (RECFM=FB ,BLKSIZE=7200 ,LRECL=80,DEN=2), C 
VB 
F 
U 
COlLS 
c 
a eeay: VOLUM=SER=T0025 ,DISP=(NEW ,KEEP) 


Lm vsolouLl DD * 
8) data cards to be stacked 
9) /* 


Dror BONG Dele x D 
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DIMENSION STEM(100 ,17),NQ(100) ,NPT( 100 ),IX(22),FX(33) 
DATE IC 7 PC) (IR Ly EY SUA (Mey Tes Ee 
3 CONTINUE 
J=0 
4 CONTINUE 
C READ # OF CARDS CONTAINING ITEM(STEM(J,K)) 
J=J+1 
READ(5),5 )(STEM(J ,K),K=1,17),NOQ(J) ,NPT(J) 
oS FORMAV Oxy U7 nug ux. iu. 2x AL) 
IF(J-1)63,63,59 
59 CONTINUE 
C CHECK FOR CONSISTENT ITEM ID # (NOQ(J)) 
IF(NQ(J)-NQ(J-1))60,62,60 
60 CONTINUE 
WRITE(6 ,61)NQ(J) ,NQ(J-1) 
61 FORMAT(/////,1X,'****NOTE ,NOTE ,NOTE ,NOTE****!' | /,6X,'MISMATCH OF ID 
US WLI INT EME Soe SAND: "Te 77/777) 
62 CONTINUE 
63 CONTINUE 
C CHECK FOR INVALID CHARACTER IN NPT(J). ALLOW C,E,*, OR BALNK 
IF(NPT(J).NE.IC.AND.NPT(J).NE.IBLK.AND.NPT(J).NE.IA.AND.NPT(J).NE. 
TIEVGO TO uD 
GO) TO, 46 
41 CONTINUE 
WRITE(6,42) NQ(J), NPT(J), J 
42 FORMAT(/////,1X,'****NOTE NOTE NOTE NOTE NOTE',/,6X,'ITEM ',14,' 
1HAS THIS INCORRECT ALPHABETIC ENDING:(',Al,') ON CARD',13,/////) 
43 CONTINEU 
C IF E END OF TAPE IS SENSED 
IFCNPT(U) .EQULE) GO TO 22 
C END OF ITEM SENSED IF NPT(J) IS BLANK. THEN READS TWO PARAMETER CARDS 
C ACCOMPANYING EACH ITEM. 
IF(NPT(J).NE.IBLK) GO TO 4 
READ (556) GIXCI). 1=1,02) (FP XG@) Ka ue) NCARDID, Gl, GEXCC) k= lyeeo ). 
1NCARD2 , ID2 
6 FORMATCD2. ei 2(1 2511) 513,510.20 to Oe i Oro sO xl le 
C7 EO Dee. te 0) 
C CHECK FOR MATCH OF ITEM ID#(NQ(J)) AND FIRST PARAMETER CARD ID # (ID1) 
IF(NQ(J)-ID1)56 ,58,56 
56 CONTINUE 


WRITE(6,57) NQ(J),ID1 
57 FORMAT(/////,1X,'****NOTE NOTE NOTE NOTE****!',./,6X,'MISMATCH OF ID 


12 BETWEEN ITEM AND PARMETER CARDs’,/,6%,'ITEM ID IS:*,14,/,6%, ‘PAR 
VAMETER DD 2S:3',04./////) 
GoTo. 3 


58 CONTINUE 
C CHECK FOR ORDERING OF PARAMETER CARDS (NCARD1 AND NNARD2) 
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IF(NCARD2-NCARD1)50,50,52 
50 CONTINUE 
WRITE(6,51) NCARD1,ID1,NCARD2,ID2 
51 FORMAT(/////,1X,'****NOTE NOTE NOTE NOTE ',/,5X,'PARAMETER CARDS A 
JRE OUT OF ORDER.'.,/,5X,'FIRST PARMETER CARD READS:CARD ",11,2xX"ITE 
2M # ',14,/7,5X,'SECOND PARAMETER CARD READS: CARD ',11,2X'ITEM #' 1 
345/////) 
GO TO 3 
52 CONTINUE 
C CHECK FOR MATCH OF PARAMETER CARDS ID #'S (1D1. AND ID2) 
TP CLpI= 12) 53.515 
53 CONTINUE 
WRITE(6,51) NCARD1,ID1,NCARD2,ID2 
GO TO 3 
22 CONTINUE 
RETURN 
END 


wi 


wicket END OF COMPILATION #%%%% 
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MEDSIRCH2 DIVISION OF EDUCATIONAL RESEARCH SERVICES 
UNIVERSITY OF ALBERTA 


BISON ORO TOONS <8: 6.6.00 OLS SNS ONS 08:08 16.0 6180S 0.0 0 6 (C0 


SEARCH FOR MEDICAL EXAMINATION QUESTIONS 
ROYAL COLLEGE OF PHYSICIANS AND SURGEONS 
DEPARTMENT OF INTERNAL MEDICINE 


PROGRAMMER: C.B.HAZLETT 


PURPOSE: 
1. READS MULTIPLE CHOICE ITEMS FROM TAPE SELECTING 
THOSE MEETING THE FIRST FOUR RESTRICTIONS USER REQUIRES. 
2. SELECTION DETERMINED BY ID NUMBER OF ITEM ACCOMPANYING 
EACH PARAMETER CARD. 


CARD INPUT: 
1. PARAMETER CARD 
2. ID NUMBERS (1615) 
SUBMIT CARDS J AND! 2 IN PATRS, 
OR IF MORE THAN ONE CARD IS NEEDED 
FOR ID NUMBERS, SUBMIT IN GROUPS. 
3. LAST CARD: BLANK CARD 


DESCRIPTION OF PARAMETER CARD (615) 

LID - NUMBER OF THIS CARD (MATCHES LED) 

MED - AREA OF SUBSPECIALITY 

- IF MED ALLERGY , IMMUNOLOGY , SEROLOGY 

CARDIOVASCULAR 
COLLAGEN DISEASES 
DERMATOLOGY 
CHEMICAL OF PHYSICAL AGENTS 
ENDOCRINOLOGY AND METABOLISM 


HEMATOLOGY 
INFECTIOUS DISEASES 
MUSCULOSKELETAL 
NEUROLOGY 

=12 PSYCHOLOGICAL MEDICINE 
=13 PULMONARY 

=14 RENAL 

=15. THERAPEUTICS 

=16 ANATOMY 

=17 BIOCHEMISTRY, 

=18 GENETICS 

=19 LABORATORY MEDICINE 
=20 MICROBIOLOGY 

=21.) PATHOLOGY 

=22 PHARMACOLOGY 

=2 310 PHYS LOLOGY 
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NTYP - TYPE OF QUESTION 
- IF NTYP=l1 SINGLE ANSWER 


=2 MULTIPLE ANSWER 
NTAX - TAXONOMIC LEVEL 


- IF NTAX=1 FACTUAL 
=2 PROBLEM SOLVING 
NCORE.-= CORE LEVEL 
- IF CORE=1 ESSENTIAL MATERIAL 
=2 MORE IMPORTANT THAN UNIMPORTANT MATERIAL 
=3 MORE UNIMPORTANT THAN IMPORTANT MATERIAL 


NUM NUMBER OF ITEMS DESIRED WITH THESE RESTRICTIONS 


DESCRIPTION OF ID NUMBERS CARD(S) (1615) 
LED = NUMBER OF THIS SET (MATCHES LID) 
NUMLD — ID NUMBERSOF ITEMS TO BE SELECTED 
= FUNCH IN PIELDS OF” EILVE 
=eLF MORE THAN «15 LTEMS)DESTRED FOR THIS SET 
START ON NEXT CARD IN COLUMN 5 


REMARKS 
LIMITATIONS 
-MAX 100 CARDS PER ITEM 
-MAX 9999 ITEMS 


SUBROUTINES 
PDISC 
PARMTR 
TEXT 


DIMENSION STEM( 100 ,17),NQ( 100) ,NPT(100 ),1X(22),FX(33),NAM(2,23),NC 
1H(2,2),NTA(1,3),NES(1,3),NUMID(25 ) 

DATA NAM/'ALL ',' Peeve rune Dineen, a! '\'DERM' ,! as 
IPHYS',!CHEM'!END ','MET ’,'GI .1,! '’HEMA','T  ' 'INF ',! 

9) LJ oKEL.,! Vo NEUR 2 VPS ee! 2 6G) Y UREN ie 
3! bere FEHR UuANAToo. BOC ae YGEN Eee Ya 
uBM','ED ‘','MICR','OB ','PATH',' '\'PHAR','M  ','PHYS','IOL ' 
Sep le/ AO b/, IBLK/ |. te, TB/ UG SNCH/' SINGS. ANS* ."MULT" )* ANS 
GLNEA;LEACE® slCGND! , UPROBY/ sNES(urss a's IMP, tUIMPY/ 


C DEFINITIONS: 


C 


100 


108 


102 


103 


NCI" COUNTER FOR NUMBER OF ITEMS PRINTED 
CALL PDISC 

CONTINUE 

NEl=40 

REWIND 1 

CONTINUE 
READ(5,102)LID,MED,NTYP ,NTAX ,NCORE ,NUM 
FORMAT(615) 

TECURIDSEO. 0)GO TO" 27 
READ(5,103)LED,(NUMID(JI) ,JI=1,NUM) 
FORMAT( 1615) 

IF(LED.EQ.LID)GO TO 106 
WRITE(6,107)LID,LED 
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107 FORMAT(1HO,15X,'MISMATCHED ID-PAIRS ON THESE PARAMETER CARDS : '! 
1iShl AND?’ 15) 
GO TO 108 

106 CONTINUE 
WRITE(6 ,2)(MED,(NAM(I MED) ,I=1,2) ,NTYP,(NCH(I,NTYP) ,I=1,2),NTAX,(N 
ITA(I ,NTAX) ,I=1,1),NCORE,(NES(I,NCORE) ,I=1,1),NUM) 
2FORMAT(1H1,10X,'RESTRICTIONS IMPOSED: '//,15X,'AREA OF SU 
1BSPECIALTY'10X,12,2X,'- ',2A4,/,15X,'TYPE OF QUESTION' ,15X,11,2X,' 
220, 2A4 5/5 LOX, TAXONOMIC LEVEL", 16%,11,2X.'— 2 1AGa/ 15%.) CORE LEV 


I 


SEL' ,21X,11,2X,'- ',1A4,/,15X,'NUMBER OF ITEMS REQUESTED! ,3X,14,//) 
3 CONTINUE 
J=0 
4 CONTINUE 
C READ # OF CARDS CONTAINING ITEM(STEM(J,K)) 
J=J+1 


READ(1)(STEM(J ,K) ,K=1,17),NQ(J) ,NPT(J) 
TPONPT(d) EO. LE) GO TO 00 
C END OF ITEM SENSED IF NPT(J) IS BLANK. THEN READS TWO PARAMETER CARDS 
C ACCOMPANYING EACH ITEM. 
IF(NPT(J).NE.IBLK) GO TO 4 
READ(1)(IX(1) ,1=1,19) 
READ @IR)C EXCL) 51=20 522) 5CEXGK) K-10 ,16) 
READC10(EX(K) 5K217,33) 
DO 104 JI=1,NUM 
IF(NQ(1).EQ.NUMID(JI)) GO TO 105 
104 CONTINUE 
GO TO 3 
105 CONTINUE 
NCI=NCI + 1 
IF(NCI.NE.1)GO TO 29 
WRITE(6,28) 
28 FORMAT(1HO,30X,'THE FOLLOWING ITEMS MEET THE ABOVE RESTRICTIONS:', 
WELL 
29 CONTINUE 
WRITE(6,33) NCI,(STEM(1,K) ,K=1,17),NQ(1) 
93 FORMAT(2K 14,05! OX, L7AU,10X,LTEM TDeIS=" 414) 
C IF NPT(J) IS A * SKIP A LINE IN PRINTING 
DECNETIC) EO .1C) GO" 10 35 
WRITE(6, 36 ) 
36 FORMAT(1H ) 
35 CONTINUE 
DO) 2 L=250 
WRITE(6,20 )(STEM(L,K) ,K=1,17) 
20 FORMAT(10X,17A4) 
TECNPT CL) wEGsLC) GO TO" 21 
WRITE(6,101) 
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FORMAT(1H_ ) 
CONTINUE 
CALLIN TEXT 
WRITE(6,32) 
FORMAT(1H1) 
IF(NCI.GE.NUM)GO TO 100 
CONTINUE 

GO TO 3 
CONTINUE 
STOP 

END 


END OF COMPILATION#*##%% 


SUBROUTINE PDISC 
(see MEDSIRCH-1, Appendix B, p. 112 


SUBROUTINE PARMTR 
(see MEDSIRCH-1, Appendix B, p. 116) 


SUBROUTINE TEXT 
(see MEDSIRCH-1, Appendix B, p. 119) 
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UPDATE DIVISION OF EUDCATIONAL RESEARCH SERVICES 
UNIVERSITY OF ALBERTA 


UPDATES TAPE BANK THAT WILL BE USED IN 
SEARCH FOR MEDICAL EXAMINATION QUESTIONS 
ROYAL COLLEGE OF PHYSICIANS AND SURGEONS 

DEPARTMENT OF INTERNAL MEDICINE 


PROGRAMMER: C.B.HAZLETT 


PURPOSE: 
1. DELETES UNWANTED ITEMS IN BANK. 
2. REVISES INCORRECT PARAMETER CARDS FOR ITEMS. 
3. ADDS NEW ITEMS. 
4. DELETES THOSE ITEMS THAT ARE TO BE MODIFIED AND 
ADDS CORRECTED ITEM. 


CARD INPUT: 
We LITLE, CARD: (20A4) 
2. PARAMETER CARD (375) 
3. IDS OF ITEMS TO BE DELETED.(OPTIONAL) Cle 25)) 


4. PAIRS OF PARAMETER CARDS FOR ITEMS NEEDING 
PARAMETER MODIFICATION. (OPTIONAL). 

5. MULTIPLE CHOICE ITEMS AND ACCOMPANYING 
PARAMETER CARDS THAT ARE TO BE ADDED. 
INCLUDES ALL ITEMS BEING MODIFIED. 


DESCRIPTION OF PARAMETER CARD (315) 

ITEMRE -NO. OF ITEMS TO BE DELETED. 
-INCLUDES THOSE NO LONGER WANTED 
AND THOSE BEING MODIFIED. 

ITPARM -NO. OF PAIRS OF PARAMETER CARDS 
BEING MODIFIED. 

ITEMAD -NO. OF ITEMS BEING ADDED. 
-INCLUDES THOSE NOW AND MODIFIED. 


REMARKS 
LIMITATIONS 
-MAX 1000 ITEMS DELETED. 
-MAX 600 PAIRS OF PARAMETER CARDS MODIFIED. 
-NO LIMIT ON NO. OF ITEMS ADDED 
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DIMENSION STEM(100,17),NQ(100),NPT(100),1X(22),FX(33),IDRE( 1000), 
BCT 00, 22.)), PXX(100 ,33),NCRD1( 100),NCRD2( 100),IDD1( 100),IDD2( 1 
200) ,TITLE( 20) 

c DALEY IC /"O) 7 IBLK/* 4/7 TASS" E/E! / 

OLD TAPE IS 8, NEW TAPE IS 1 
REWIND 8 
REWIND 1 
C READ AND WRITE OUT TITLE CARD 
READ(5,200 )TITLE 
200 FORMAT( 20A4) 
WRETE( 6.20) *TITLe 
201 FORMAT( 20X,20A4,///) 
Cc READ PARAMETER CARD 
READ(5,1)ITEMRE, ITPARM ,ITEMAD 
1 FORMAT( 315) 
G IF NOT REMOVING ANY ITEMS SKIP READING IN THE ID NO. FOR SUCH ITEMS 
TFCITEMRE. EO.0)GO TO 633 
READ(5,2)(IDRE(1I) ,I=1,ITEMRE) 
2 FORMAT(1615) 
WRITE(6,702)(IDRE(I) ,I=1,ITEMRE) 
702 FORMAT(09X,' THESE ITEMS HAVE BEEN REMOVED : ',15,/,(42X,15)) 
333 CONTINUE 
C IF NO CHANGES IN PARAMETER SKIP READING CHANGES 
IF(ITPARM.EQ.0) GO TO 7 
Cc READ IN ALL PARAMETER CARDS THAT ARE USED AS MODIFICATIONS 
DO 102 N=1,ITPARM 
READ(S., LOS) CIKX(NG 0) yo a0. 22), (FXX(NGK),K=1,16) NCRDICN) IDDLGN) SCF 
EXXON KOS K=19633 ) NCRD2(N) , I DD2(N) 
LOS PORMATUD2 3 Ll. 2 (12511) 413,511 .211 212 211 Toy Iu GAD ox hie ial 7 
TAD te LA, TB) 
€ CHECK TO SEE THAT PAIR OF PARAMETER CARDS ARE IN CORRECT ORDER. 
IF(NCRD2(N)-NCRD1(N) )80,80,82 
80 CONTINUE 
WRITE(6,81)NCRD1(N),IDD1(N) ,NCRD2(N) ,IDD2 
81 FORMAT(///,1X,,'*** NOTE:',/,5X,'THESE PARAMETER CARDS WHICH ARE B 
IEING USED AS MODIFICATIONS ARE OUT OF ORDER.',/,5X,'FIRST PARAMETE 
OR CARD READS: CARD *,11,' ITEM NO. *,14,/,5X, "SECOND PARAMETER CARD 
BY READS CARD? 's11,) ITEM NO. 45055777) 
C COUNTER FOR MISTAKES MADE IN THIS RUN. 
NWRONG=NWRONG+1 
82 CONTINUE 
IF(IDD1(N)-IDD2(N) )83,84,83 
83 CONTINUE 
WRITE(6,81)NCRD1(N) ,IDD1(N) ,NCRD2(N) , IDD2(N) 
NWRONG=NWRONG+1 
84 CONTINUE 
102 CONTINUE 
WRITE(6,703)(IDD1(N) ,N=1, ITPARM) 
703 FORMAT(/,10X,'PARAMETER CARDS OF THESE ITEMS ( INDICATED BY THEIR 
1ID NUMBERS ) HAVE BEEN CHANGED:',I5,/,(93xX,15)) 
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7 CONTINUE 
105 CONTINUE 
eG READ ITEMS AND PARAMETER CARDS FROM OLD TAPE 
J=0 
106 CONTINUE 
J=J+1 
READ( 8,110 )(STEM(J ,K) ,K=1,17) ,NQ(J) ,NPT(J) 
110 FORMAT(1X,17A4,4X,I14,2X,Al) 
G IF COLUMN 80 HAS AN E ON OLD TAPE THE END IS SENSED. 
TPGNPT (J) 280. TE) GOaTt0, 122 
eC IF COLUMN 80 HAS A BLANK THEN END OF THIS ITEM 
IF(NPT(J).NE.IBLK)GO TO 106 
Cc IF END OF ITEM READ PAIR OF PARAMETER CARDS 
READ C3y203 )(1X(1) 121,22) FxX(K) .K=1,16.) NCARDI, IDL. (FX(K) ,K=17, 33 
1),NCARD2,ID2 
@ CHECK TO SEE IF ITEM READ FROM OLD TAPE IS TO BE REMOVED 
TEGDREMRE.E0.0)GO) TO 120 
DO 107 M=1,ITEMRE 
IF(NQ(1)-IDRE(M))107,105,107 
107 CONTINUE 
120 CONTINUE 
€ IF ITEM NOT REMOVED OR MODIFIED WRITE ON NEW TAPE 
DOKLOS «Lalod 
WRETE C81 10:)(STEM(L,K).K21,27) ,NOCL) sNPT(L) 
108 CONTINUE 
G CHECK FOR CORRECTION OF PARAMETER CARD 
IF(ITPARM.EQ.0)GO TO 121 
DO 109 N=1,ITPARM 
IF(IDD1(N)-ID1) 109,111,109 
109 CONTINUE 
GO TO 121 
111 CONTINUE 
e WRITE CORRECTED PARAMETER CARDS. 
WRITE(1,103)(IXX(N,J),J=1,22),(FXX(N,K) ,K=1,16),NCRD1(N) ,IDD1(N) ,¢ 
1LFXX(N,K) ,K=17,33),NCRD2(N) ,IDD2(N) 
GOuTO);105 
121 CONTINUE 
C WRITE OLD PARAMETER CARDS 
WRITE( 1,203) (1X(1),1=1,22),(FX(K),K=1, 16) ,NCARD1,1D1,(FX(K),K=17,3 
13),NCARD2 , 1D2 
GOnTONLOS 
122 CONTINUE 
@ IF NO ITEMS BEING ADDED OR MODIFIED, SKIP 
IF(ITEMAD.EQ.0) GO TO 222 
3 CONTINUE 
J=0 
4 CONTINUE 
C READ # OF CARDS CONTAINING ITEM(STEM(J,K)) THAT IS BE ADDED TO TAPE 
ie (INCLUDING NEW AND REVISED ITEMS.) 
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J=J+1 
READ(5 ,110 )(STEM(J ,K) ,K=1,17),NQ(J) ,NPT(J) 
IF(J-1)63,63,59 
59 CONTINUE 
C CHECK FOR CONSISTENT ITEM ID # (NQ(J)) 
IF(NQ(J)-NQ(J-1))60,62,60 
60 CONTINUE 
WRITE(6 ,61 )NQ(J) ,NQ(J-1) 
61 FORMAT(/////,1X,'****NOTE ,NOTE ,NOTE ,NOTE****!' ./ 6X,'"MISMATCH OF ID 
LSAWETHIN SITEM: es AND 1 04.///7/) 
NWRONG=NWRONG+1 
62 CONTINUE 
63 CONTINUE 
C CHECK FOR INVALID CHARACTER IN NPT(J). ALLOW C,E,*, OR BLANK 
IF(NPT(J).NW.IC.AND.NPT(J).NE.IBLK.AND.NPT(J).NE.IZ.AND.NPT(J).NE. 
1IE)GO TO 41 
GO TO 43 
41 CONTINUE 
WRITE(6,42) NQ(J), NPT(J), J 
42 FORMAT(/////,1X,'****NOTE NOTE NOTE NOTE NOTE',/,6X,'ITEM ',I4,' 
1HAS THIS INCORRECT ALPHABETIC ENDING:(',Al,') ON CARD',I3,/////) 
NWRONG=NQRONG+1 
43 CONTINUE 
C IF E END OF TAPE IS SENSED 
TERGHPT GV EQ2GE GO TO 222 
C END OF ITEM SENSED IF NPT(J) IS BLANK. THEN READS TWO PARAMETER CARDS 
C ACCOMPANYING EACH ITEM. 
IF(NPT(J).NE.IBLK) GO TO 4 
READ(5 p00). GExX( 1), 1=1),22),(FX(K) ,K=1,160),NCARDI,ID1L.(FxXCK) Ka17 503 
1),NCARD2 ,ID2 
C CHECK FOR MATCH OF ITEM ID#(NQ(J)) AND FIRST PARAMETER CARD ID # 
1 (ID1L)IF(NQ(J)-ID1)56,58,56 
56 CONTINUE 
WRLTE(6,57) NOCJ),1Di 
57 FORMAT(/////,1X,'****NOTE NOTE NOTE NOTE****!'./,6X,'MISMATCH OF ID 
1S BETWEEN ITEM AND PARMETER CARD:',/,6X,'ITEM ID IS:',I4,/,6X,'PAR 
LAMETER 2D eiS:*505./////) 
NWRONG=NWRONG+1 
GO TO 3 
58 CONTINUE 
C CHECK FOR ORDERING OF PARAMETER CARDS (NCARD1 AND NCARD2) 
IF(NCARD2-NCARD1 )50,50,52 
50 CONTINUE 
WRITE(6,51) NCARD1,ID1,NCARD2,ID2 
51 FORMAT(/////,1X,'****NOTE NOTE NOTE NOTE ',/'5X,'PARAMETER CARDS A 
1RE OUT OF ORDER.',/,5X,'FIRST PARMETER CARD READS:CARD ',11,2xX'ITE 
QM # ',Ir,/,5X,'SECOND PARAMETER CARD READS: CARD ',I1,2X'ITEM #',I 
34,/////) 
NWRONG=NWRONG+1 
GO TO 3 
52 CONTINUE 
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C CHECK FOR MATCH OF PARAMTER CARDS ID #'S (ID1. AND ID2) 
IF( ID1-ID2)53,54,53 
53 CONTINUE 
WRITE(6,51) NCARD1,ID1,NCARD2,ID2 
NWRONG=NWRONG+1 
GO TO 3 
54 CONTINUE 
DO s0l L=1,d 
WRITE(1,110 )(STEM(L,K) ,K=1,17) ,NQ(L) ,NPT(L) 
IPCL2EO0.1,) GO TO 803 
GO TO 801 
803 CONTINUE 
WRITE(6,704) NQ(1) 
704 FORMAT(10X,'ITEM ',15,' HAS BEEN ADDED AS A NEW OR MODIFIED ITE 
1M.') 
801 CONTINUE 
WRITE(1,103 )(IX(1I),1=1,22),(FX(K),K=1,16 ) ,NCARD1,ID1,(FX(K) ,K=17,3 
13), NCARD2,1D2 
GO TO 3 
c WRITE E IN COLUMN 80 FOR PURPOSE OF SENSING END ON NEW TAPE 
209 WRITE (1. 100) CSTEMCI,K)isK=1517)),.NOCL)NPTCL) 
IF(NWRONG.GT.0) GO TO 501 
GO TO 503 
501 CONTINUE 
WRITE( 6,500 )NWRONG 
500 FORMAT(///,1X,'*** NOTE:',/,5X,'THIS ATTEMPT TO UPDATE DATA FILE H 
1AS NOT BEEN ACCURATELY DONE.',/,5X,'THERE ARE ',14," MISTAKES MADE 
2.REGARD ABOVE MESSAGES TO CORRECT AND RUN THIS PROGRAM AGAIN.',//) 
GO TO 378 
503 CONTINUE 
WRITE(6 ,500 )NWRONG 
502 FORMAT(///,1X,'*** NOTE:',/,5X,'TO BE SURE TAPE HAS BEEN PROPERLY 
1UPDATED RUN MEDSIRCH2 ASKING FOR ITEMS THAT WERE MODIFIED OR ADDED 
Ca/ ta) 
378 CONTINUE 
STOP 
END 
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H.1 WITHOUT MODIFICATIONS 


Amount of Core Needed for 
Number of Number of 


Blocking Tapes Discs 
Program Executing (2 buffers) Total Needed Needed 
CHECK 4K 4K 
UTLLETY 1K 15K 16K HT 
MEDSIRCH-1 10K 60K 70K aE 4 
MEDSIRCH-2 10K 30K 4OK ke i 
UPDATE SUK 30K THK m2 


ea 


Minimal requirements for using implemented design is 74K core storage, 
2 tapes, and 4 discs. 
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H.2 WITH MODIFICATIONS 


Modifications possible: 


1. use tapes in lieu of discs in MEDSIRCH-1 and MEDSIRCH-2. 
2. do not block tapes or discs. 


3. do not retrieve items at some or all lower hierarchical levels 
Cia pp. 5/-98 ) in MEDSIRCH-1. 


Note: If modifications (1) and (2) are used efficiency will be poorer. 


Amount of Core Needed for 
Number of Number of 


Blocking Tapes Discs 
Program Executing (2 buffers) Total. Needed Needed 
CHECK 4K 4K 
UIILOTY 1K 1K a 
MEDSIRCH=-1 10K 10K 1-4 
MEDSIRCH-2 10K 10K il 
UPDATE 54K 54K 2 


cee ee ES —————————e 
a — 


Minimal requirements for using modified implemented design is 54K core 
storage and 2 tapes. 
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